Restricted exchangeable partitions and embedding 
of associated hierarchies in continuum random trees 



On 
O 
O 
(N 



Bo Chen* Matthias Winked 

November 30, 2009 



Abstract 

We introduce the notion of a restricted exchangeable partition of N and study natural 
classes of such partitions. We obtain integral representations, study associated coalescents 
and fragmentations, embeddings into continuum random trees and convergence to such limit 
trees. As an application, we deduce from the general theory developed here a particular 
limit result conjectured previously for Ford's alpha model and its non-binary extension, the 
alpha-gamma model, where restricted exchangeability arises naturally. 
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1 Introduction 

Following de Finetti and Kingman, we call a measure on the space Vb of partitions of B C N 
exchangeable, if it is invariant under the natural action on Vb of the symmetric group on B; 
and a random partition is called exchangeable if its distribution is exchangeable. For a partition 

■ 11 = { 7T i,'i G N}, each non-empty 7Tj C B is called a block of tt. When tt has only finitely many 
blocks, we often omit from tt. We arrange the blocks of tt in the order of least element, i.e. 
min7Tj < min-7Tj for every i < j, followed by with the convention min0 = oo to be definite. 
For finite 7Tj, we consider the block size #7Tj. For n G N, we set [n] = {1, . . . , n}. Then a measure 
fj, on V = Vn is exchangeable if and only if the discrete measures \i n on V n = V\ n \ , given by 

Mn(M) = fi(V w ), vr € V n , where V n = {T G V : T| n = tt} and T| n = {Ti n [n],i > 1}, (1) 

are exchangeable for all n > 1. Furthermore, a measure [i n on V n is exchangeable if /x n ({7r}) 
/i n ({-7r'}) for all 7r,7r' G V n with the same multiset of block sizes. 

Several weaker forms of exchangeability have been studied in the literature, notably Pitman's 
partial exchangeability [28] and Gnedin's constrained exchangeability [15] . See Section 13. 11 We 
introduce here a new weak form of exchangeability. We first call a measure \x on a subset 
Sb Q Vb of partitions of a finite B C N exchangeable on Sb if h({tt}) = fi({7r'}) for all 
tt, tt' G Sb with the same multiset of block sizes. Now consider the infinite case. 

Definition 1 Let S C V. Consider S n = {tt G V n : V n Q S}, n > 1. We call a measure [i on S 
exchangeable on S if the restrictions to S n of the measure \i n given by ([1]) are exchangeable on 
<S n , n > 1, and if {r G S : r| n S n for all n > 1} is a /i-null set. 

A measure /i on V is called restricted exchangeable if P can be decomposed into disjoint 
measurable V 3 , j > 0, so that the restrictions of /u to V 3 are finite and exchangeable on V 3 . 
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Remark 2 (i) Alternatively, we might include in S n all partitions tt of V n whose associated 
cylinder sets V w intersect S. When this makes a difference, some of the Vh, j > 0, in the 
definition of restricted exchangeability would not be disjoint, giving a different notion that 
seems less natural to us and that we do not propose to investigate further in this paper. 

(ii) If we dropped the requirement that the parts of S missed by all /%, n > 1, form a ^-null 
set, then all (finite) measures on V would be restricted exchangeable; we could choose 
pi = {Tj}, j > 1, V° = V \ {Fj,j > 1} for a countable subset {Tj, j > 1} C V, that is 
dense for the metric d(T,T') = 2 " inf { n ^ 1:r l™^ r 'M, because then V n = for all n > 1. 

It will be useful to consider Kingman's branching graph (/C, E) with K, = |J n >o ^« rooted at 
the unique element {0} G Vq, equipped with the directed edge relation (tt',tt) G E if n' G V n , 
tt G V n +i with tt' = vrn[n] for some n > 0, cf. [23] . Then b : V -> /C N given by b(T) = (T\ n ,n > 0) 
is an injection onto the set Q C /C N of infinite chains starting at {0}. We write tt' -< tt if tt' G P n 
and 7r' = 7T n [n]. For 7r € /C, we denote by /C 71 " = {7r' S /C : 7r ^ -n - '} the cylinder set of tt in 
/C. We consider subgraphs 6 C /C of (/C, J5) that, without further mentioning, are connected and 
contain {0}. For a subgraph £ all connected components of K, \ £ are of the form K7 for some 
tt e /C, indeed they are {/C 71 " : 7r G Cg}, where Cg = {tt G K. \ £ : (tt' , tt) £ E for some 7r' G 

Let 5J- = {s= (s h i > 1) : si > s 2 > ... > 0,^ 4 >iSi < !}■ For s G 5^, Kin gman's paintbox 
|24j is the exchangeable distribution k s on V of the value partition induced by independent 
random variables (£, r , r ^ 1) with respective distributions 

P(£r = *) = *, »>1> P^r = -0 = «o := 1 - Ei>i *i- 

For 7r G /C we introduce here modified paintboxes by conditioning on the cylinder set V n = {T G 
P : r| n = tt} of 7r in "P, but note that this conditioning is degenerate in some cases; specifically, 
for s G S^- let m > such that s m > s m+ i = (or m = oo if S{ > for all i > 1), suppose 
that 7r G /C has & non-empty blocks, of which £ are of size at least 2, then we call admissible for 
(tt, tt' , s), 7T 7 >z tt, any collection (ii, . . . , *V) of indices that are 

• distinct with ij > 1 except that we allow ij = for any (also multiple) j with ftir'j = 1; 
this applies if so > and m > or so = and m > k, (non-degenerate case); 

• distinct with ij > 1 except that we allow ij = for any j with ir'j = 1 or #tt'j = j^TTj < r 
and also for q — m of the j with #7r'- = 7Tj = r, where (q, r) is such that < m blocks of tt 
have size > r + 1 and q > m have size > r; this applies otherwise (degenerate case). 

The modified paintboxes are now given as measures on V n C V, for tt' G JC n , by 

<( V) = J_ £ s # { l<i< fc ':^0 } -( fe - W ) + n 

s admissible for (7r,7r',s) l<j'<fe':ij7^0 

where is the normalisation constant; we can drop (/c — m) + = (k — m)l^ k>m y unless so = 0. 
For tt = {{1}}, this is a well-known formula for Kingman's paintbox k s = Kg, with Z£ = 1. 

Theorem 3 Let n be a measure on V . Then [i is restricted exchangeable if and only if there 
are a subgraph £ C K and for each tt G a finite measure on such that 

y. = Y] / K^u n (ds). 

Note that restricted exchangeable measures can be infinite, if £ is infinite. In this case, we 
can consider the pre-image V° of the set of the infinite chains in £ under the bijection b :V — > Q 
introduced above. Then n(V°) = and /i is finite on all compact subsets of V \ V°, in the sense 
induced by the metric topology on V described in Remark [2)^ ii) , in particular fi is then cr-finite. 
Indeed, such measures are locally finite in the sense of Vershik and Kerov |34j . 
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Examples 4 (i) A natural class of restricted exchangeable measures can be obtained by 
conditioning an exchangeable random partition II on {II D [n] = tt} for some tt € V n . More 
generally, for £ = Ufe=o ^k, we can use total masses of to specify P(IIn[n] = n) = u n {S^) 
and given {II n [n] = vr}, specify asymptotic frequencies according to v^jv^[S 1*), it € V n . 
We can use this idea to approximate any distribution \i on V by restricted exchangeable 
distributions using u n (S^) = /i('P 7r ), tt G V n: and any asymptotic frequencies, n > 1. 

(ii) For B C N, let 1b be the trivial partition of a single block B. Dislocation measures are 
measures on 'P\{1n}, finite on compact subsets. See Section I3T21 We set £ = {lm, j > 1} 
and note that the connected components of K, \ £ are K? = j > i, go that 
~pi = j > 1, is a natural decomposition of V \ {In}- Bertoin's [6] (possibly 
infinite) exchangeable dislocation measures are clearly exchangeable on V 3 , j > 1; also the 
finiteness condition on V 3 holds since V 3 is compact for all j > 1. 

(iii) Ford's alpha model [H] and the alpha-gamma Markov branching model [TT] are restricted 
exchangeable, but not exchangeable. See Section [6] for the definition of these natural 
examples of restricted exchangeable dislocation measures as well as conclusions. 

(iv) For 5 C M, let Ob be the partition of B into singleton blocks {j}, j £ B. Coalescent 
L-measures (allowing simultaneous multiple collisions) are measures on / P\{On}, finite on 
compact subsets. See Section Set £ = {0 [n] ,n > 1}, fc^- 1 )^- 2 )/^ = /C 7 ^, where 
7r ifc = (()[*_!] \ {{i}}) U {{i, k}}, and p(*-i)(*-2)/a+< = -pn lk f or i < ^ < fc. Schweinsberg's 
|32j (possibly infinite) exchangeable L-measures are finite and exchangeable on V 3 , j > 1. 

From Theorem [3] we deduce an integral representation for restricted exchangeable dislocation 
measures. For simplicity we only allow as decomposition of V in Definition [1] the most relevant 
and natural V° = {1 N } and V 3 = V {[j] ' {j+1}} , j > 1. 

Corollary 5 Let k be a restricted exchangeable measure with subgraph £ = {1[j],j > 1}- Then 
for each j > 1, there are constants Cj > and kj > 0, and a measure uj on with 



such that, for = {{j}, N \ {j}} and (M = {[j], {j + 1}, {j + 2},...},j> 1, 

k = aS em + (cjAo+i) + kjS ub] + [ k s (- n r 3 )uj(ds)) , where V 3 = V^Ui+i}} 



Bertoin [6] iterated random partitions from an exchangeable dislocation measure v to create 
exchangeable V- valued fragmentation processes (F*(t),t > 0). Furthermore, the associated 
closed exchangeable hierarchies TC* = {F*(t),i > 1, t > 0} cl of blocks visited by such a process 
are naturally interpreted as trees, see e.g. [26J, and are often naturally embedded in a-self-similar 
continuum random trees (CRTs) [JjJ], say {T^ a v ^ /j,), via samples S* € T{ a ^)-, i > 1, conditionally 
independent given (T^ a ^,fi), each with distribution /x, as H* = {C*(T v ),v G T( a ^} cl , where 
C*(T V ) = {i G N : X* G T 1 '} and T 1 ' is the subtree of T{ a ,u) rooted at v. See Sections 13.21 and 
14.11 Vice versa, such exchangeable hierarchies derived from fragmentation processes (or Markov 
branching trees) can be used to construct CRTs as scaling limits |20j . In the second part of this 
paper we carry out a similar programme for the restricted exchangeable case, starting from a 
restricted exchangeable dislocation measure of the form identified in Corollary 

One of our main results is a general embedding result. This generalises [30 \ Theorem 4], 
which treated Ford's alpha model and a binary two-parameter extension that, apart from the 
alpha model, is not restricted exchangeable in the sense of Corollary [H 
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Theorem 6 Let TL be the hierarchy of a restricted exchangeable fragmentation such that 



j>i V>i / 

satisfies Jc;(l — si)v(ds) < oo and v(sq > 0) = 0, and such that Cj = kj = for all j > 1. T/ien 
7i can 6e embedded asTL = {C(T v ),v E 7[ a i; )} cl in an a- self- similar CRT T( a v ) with dislocation 
measure v, where C{T V ) = {i G M : Sj 6 T"} for some (dependent) S T{ a ,v)j i > 1- 

There is an integral representation for coalescent L-measures analogous to Corollary [5] see 
Propositions [13] in Section 13.31 We leave open the question whether associated restricted ex- 
changeable coalescents can be embedded in the A-coalescent measure trees (and analogous trees 
for E-coalescents) in the sense of Greven et al. [17] , 

Let us turn to the convergence to CRTs. Just like partitions, hierarchies of N are uniquely 
determined by their restrictions T n = 7i\ n = {Bn [n],B £ TL\ that form here a consistent family 
(T n , n > 1) of random trees with T n as vertex set and implicit edge set given by the parent-child 
relation that assigns to each non-singleton B £ T n as children the maximal strict subsets in 
T n that, by construction, form a partition of B. We can delabel these trees and add a root 
vertex to obtain rooted combinatorial trees T°, i.e. connected acyclic graphs with no degree-2 
vertex, but some degree-1 vertices, one of which is the distinguised root. We can regard T° as 
a path-connected metric space with unit distance between vertices and connected by unit line 
segments. Notation T°/a is then understood as scaling all distances by a factor a € (0, oo). 

In the exchangeable case, [20] obtain CRT convergence under a regular variation condition 

v{s\ < 1 — e) = e~ a £(l/e) as e J. 0; for some a € (0, 1) and slowly varying t (2) 
and a log-moment condition 

/ N Sj| log(si)\ 8 u(ds) < oo for some g> 0. (3) 

Theorem 7 If in the setting of Theorem® the dislocation measure v satisfies (d|) and and 
if Vj = v m for some m > 1 and all j > m, then 

" r — > T(a,v) i n probability, in the Gromov-Hausdorff sense. 

In addition to the proofs of the main results formulated in the Introduction, the content 
of this paper is as follows. Section 2 mainly proves Theorem [3] and Corollary [5) Section 3 in- 
cludes a discussion of the relationship between restricted exchangeability, partial exchangeability 
and constrained exchangeability, a more detailed introduction to restricted exchangeable coales- 
cents and fragmentations and associated hierarchies, and an integral representation analogous 
to Corollary [5] for restricted exchangeable coalescents. 

Sections 4 and 5 deal with the proofs of Theorems [6] and [7J but we also develop a general 
method to sample leaves in non-binary self-similar CRTs in Section 4, while Section 5 studies 
in some detail the embedding used to prove Theorem [6] in order to obtain individual estimates 
for each leaf i > 1, where in the proof in [20J for the exchangeable case, consideration of a 
single leaf £* in (7^,),//) sampled from fi gives direct estimates for all £*, i > 1. This includes 
here estimates for a pth. moment renewal theorem in Lemma [23] and an application to Gnedin's 
constrained paintboxes in Lemma 124] both of which may be of independent interest. While we 
build on [20], we do not repeat the content of [20], we rather focus on the new developments here 
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and then provide only a half-page sketch when the arguments of [20] can be applied to complete 
the proof of Theorem \7\ with the exception of Lemma [271 which deals with large deviations of 
block numbers of partitions associated with subordinators and may be of independent interest. 
Its proof is completely analogous but more general than in [20] and references therein, so we 
have rewritten it for the present context and included it in an appendix. Significant intermediate 
results for the proof of Theorem [7J include almost sure convergences of rescaled subtrees of T n 
spanned by k leaves as first n — * oo in Proposition [28] and then also k — * oo in Formula (|23p . 

Section 6 demonstrates how the main results obtained in this paper can be applied to the 
alpha model, the alpha-gamma model and more general skewed Poisson-Dirichlet models that 
we introduce here as a natural three-parameter family of restricted exchangeable fragmentation 
models. We show in these examples that Theorems [6] and [7J typically refer to Markov branching 
trees T° that are not sampling consistent, so that the theory developed in [20] does not even yield 
convergence in distribution for these trees, where we here establish convergence in probability. 



2 Integral representations, proof of Theorem [3] and Corollary [5] 

Our first aim is to understand exchangeability on subsets of the form V 1 * C V for some ir € V n . 
It is easy to show that the modified paintboxes are exchangeable on V n . 

Proposition 8 The modified paintbox k£ can be expressed in terms of any T E with asymp- 
totic frequencies s, provided that any blocks ofT with zero asymptotic frequency are either subsets 
of [n] or singletons, as 



- oc 



#{tt" G V? : vr" ~ r| r } 
where n' ~ tt" if ir' and tt" have the same multiset of block sizes. 

Proof. This proof is a refinement of the relevant part of the proof of [231 Theorem 3.1], Kerov's 
proof of Kingman's paintbox representation of exchangeable partitions in V, where we need to 
take into account the distortions due to restriction to V n . We evaluate the right-hand side. 
Numerator and denominator are easily calculated as 

#{tt" € V? ■ tt" ~ T| r } = ^ ( ji-r i _u_ i "jit I / air I ) rT°° !' 

^ V# r nlr - # n V ■ ■ • ># r i fe /lr - 7T fe /, Mothers Ir/ [\j=l Pj- 

where ^ is over indices (ii,... ,iy) such that \ r — #7r'- > for all j € [A/], r ot h crs |r is the 
vector of all r^| r , i > 1, except . . . , IYX and pj is the number of blocks of T| r with j 

elements, j > 1. Now first assume so = 1 — X^i>i s i = 0> then we deduce that the limit exists 
and is given by Zg' 77 /Z^ ,7r , where 



El T—n \ 

\#r u | r -#7r' ...,#r, : , lr-#7r',,#r nthnrs |j 



d 



giry _ y x # r nlr- #■*[,. ..,#T ik \ r -#-K' k ,,#r otiL era\r/ _ -p-r #7r'. 

^ ^. — j r a / j X X 

V#ri| r ,#r2|r,~/ (ii,...,i k ,) admissible for (n,n' ,s) j:si->0 

where d is the minimal sum over all those block sizes that have to be mapped to zero-limiting- 
frequency blocks (if any; d > only in the degenerate case); in fact this power d is such that 
terms with higher than the minimal sum actually become negligible, and we identify Kg (V w ). 

If so > 0, blocks of zero limiting frequency need to be treated differently, because their union 
Tq now has a limiting frequency, and a union tt' of blocks of tt' can indeed be associated with 
Tq. Specifically, we calculate a first factor as 



3 

3 ' 



^ v#r -5?' #r in |r-#5Fi,-,#r ir ,| r -#5r' ,#r othcrs |,J 



fSs, r~ — ~ r - ) = ^ s o II" 

^# r o|r,#ri|r,#r2|r,---'' (ij ,iy ) admissible for (7T,7r' ,s) j=l 
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but then need to also count the further partitions of the block of size #IV This yields for 
7Tq = tt'-j U • • • U ir'j a positive limit factor if d = #7fg — b is minimal, which we then calculate as 



^ V#r,;, | r -#7r',.,...,#r,;J r -#7rl,l,...,ir _^ 

= s > 



v#r 4l | r -#^ #r ifc | r -#^ 
lim - 



(#r | r )! 

note that the number of available indices is asymptotically equivalent to #ro[ r ~ sqt, so that 
the sum contains ~ (#ro|r) b terms, and this contributes significantly to the asymptotics of the 
numerator. Finally we sum over the different choices of tt' with #7Tq — b = d to identify Kg ("P 71 ). 

□ 

With these representations of the modified paintboxes, we now obtain the integral represen- 
tation of general measures that are exchangeable on for some tt £ V n . 

Proposition 9 Let \i be a finite measure, exchangeable on V* for some tt G V n . Then there is 
a finite measure v on such that \i = f sl KgU^ds). 



Proof. This proof uses a combination of the martingale method due to Vershik and Kerov [3 
Theorem 2] and the de Finetti method used by Aldous {TJ. W.l.o.g., /i is a probability measure. 
Let II ~ /t for an exchangeable probability measure on V 77 . Consider the process 

_ #K e V? : tt" ~ u\ r } , 
* r - #{tt" E V- r : n" ~ n| r } ' r " n ' 

in the decreasing filtration T T generated by the block sizes of II| r /, r' > r. By exchangeability, 
X r only depends on its block sizes B r and is hence J>-measurable and E[X r |.7>+i] only depends 
on X r+ \. For a multiset b of block sizes, denote by m(b) (resp. m'(b)) the number of chains in 
IC n from tt (resp. in from tt 1 ) to block sizes b. By exchangeability, each of these chains is 
equally likely. For block sizes B r+ \ = 6 r +i, we denote by m(b r , £> r +i) the number of chains from a 
specific partition with block sizes b r to any partition with block sizes b r+ \, then m(b r )m(b r , 6 r +l) 
chains from tt to block sizes b r+ \ pass via block sizes b r . With this notation, we have X r = 
m'(B r )/m(B r ). Then 

miY m u i s-^rn(b r )m(b r ,b r+1 )m'(b r ) 1 ^ , m'(b r+1 ) 

for all admissible b r +i shows that (X r ,r > n) is a bounded martingale and hence converges a.s. 

On the other hand, de Finetti's theorem yields that asymptotic frequencies exist //-a.s. 
Specifically, consider a partition n with distribution /t and, independently a sequence f7j, i > 1, 
of auxiliary independent uniform random variables. Then the random variables 

X j = U i ifjGlT, j>n + l, 

are exchangeable. By de Finetti's theorem, they are conditionally i.i.d. and the atom sizes Si 
of the random limiting distribution in size-biased order satisfy 



#{j e{n+l,...,n + r}:X j = U i } _ K _ #Ik n [r] 



Si = lim = lim 

r — >oo t r — T 

Clearly, the latter limit does not depend on the auxiliary variables (Ui,i > 1), so asymptotic 
frequencies exist //-a.s. Furthermore, //-a.e. partition is such that blocks with zero asymptotic 
frequency either only involve elements of [n] or are singletons. Denote by v the distribution on 
S^ of the asymptotic frequencies (Si, i > 1) rearranged into decreasing order of II. 
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This means that /U is concentrated on those partitions for which Proposition [8] yields modified 
paintbox representations, and we see that X r — > /^("P 71 " ) a.s., where S ~ v; but (X r ,r > n') is 
a bounded martingale, so exchangeability on yields 

/ <(P">(cb) = n^ur n ')} = E[x n ,] = W*) ^ „ = tip*). 

□ 

This proof raises the question whether we could have done without the martingale method 
or without the de Finetti argument, as can be done in the exchangeable case. In fact, to avoid 
de Finetti, we would have to generalise Proposition [8] to ensure that all T for which the limits 
in Proposition [S] exist converge to modified paintboxes, which seems more difficult given the 
exceptional non-singleton sets of zero limiting frequency. On the other hand, our de Finetti 
argument only identifies the distribution of II restricted to {n + l,n + 2, . . .} and gives little 
information about the conditional distribution of how the blocks of tt attach themselves to such 
paintboxes. We have not found a simple and direct argument to see why the modified paintboxes 
describe the only way to attach tt in an exchangeable way. 

Now recall that Theorem [3] states that restricted exchangeable measures on V are precisely 
those of the form fi = ^7reC £ i^l v ^{ds). 

Proof of Theorem [3l First consider fi = X^eC^ K s u ^{ds). Take as V° the pre-image under 
the bijection b : V — > Q of the set of infinite chains in £. Then /j-(V°) = because none of the 
measures J sl KgV n (ds) have mass in V°. Indeed, the decomposition 

p = p°U (J V n 

is into disjoint measurable sets, and the restrictions of [i are finite and exchangeable on V n . 

Conversely, let fi be any restricted exchangeable measure on V and V 3 , j > 0, a measurable 
decomposition so that the restrictions of [i to V 3 are finite and exchangeable on V 3 . We first 
show that each of these restrictions of [i to V 3 is of the form required. Fix j > and consider 
Vn = {tt £ V n : V 71 C V 3 }. Then we can define inductively 

4=K 4 +1 =Ciulp n+1 \ |J KlA, n>l, C 3 =[jCi, 

and obtain, by construction, disjoint V* , tt € C 3 , and exchangeable restrictions to these cylinder 
sets, which, by Proposition El can be represented as f s ± KgV^(ds). Since furthermore fJ>({T G 
V 3 : Y\ n Vi for all n > 1}) = 0, we have 



ti-nv 3 )= I <Mds). 



red 

Now set C = Uj>o & and £q = Un>oi 7rl ^ M : 77 e Cn^+i}. If we now set £ to be the connected 
subgraph generated by £c and 0, then £ differs from £q only by some n -< tt E fc that do not 
contribute to Cg = C. This completes the proof. □ 

The proof of the Corollary [5] is now straightforward. The Uj are not simply the i/ n for 
tt = {[j], {j + 1}}, j > 1, but almost. On the one hand, we have moved atoms of u % in (0, 0, . . .) 
to a constant kj > and in (1,0,...) to a constant Cj > 0. The corresponding modified 
paintboxes are ${\j],{j+i},{j+2},...} an d ^{{j+i},N\{j+i}}i respectively, except for j = 1, where the 
modified paintbox is f (^{{1},N\{1}} + ^{{2},N\{2}})- On the other hand, we have incorporated the 
normalisation constants Z£ of the modified paintboxes as densities into za and can and do here 
use restricted Kingman paintboxes K s (- fl V 3 ) rather than normalised modified paintboxes n™. 
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3 Basic results on restricted exchangeability and related notions 



3.1 Partially exchangeable and constrained exchangeable partitions 

Let us explore the connections between the restricted exchangeable partitions introduced in this 
paper and other generalisations of exchangeability studied in the literature, notably partial ex- 
changeability and constrained exchangeability. Partially exchangeable partitions were introduced 
by Pitman [28]. A measure \x n on V n is partially exchangeable if /i ra (vr) = jU n (7r') for all 7T, tt' G V n 
with the same vector of block sizes in the order of least element. Partially exchangeable mea- 
sures are not restricted exchangeable, in general, nor vice versa. Specifically, tt = ({1, 2}, {3, 4}) 
and tt' = ({1, 3}, {2, 4}) have the same mass for partially exchangeable measures but not nec- 
essarily for restricted exchangeable measures. Vice versa, consider tt = ({1, 2, 3}, {4, 5}) and 
tt' = ({1, 2}, {3, 4, 5}) . In fact, "the intersection" of the two concepts is exchangeability: 

Proposition 10 A measure fi n ofV n is exchangeable if and only if it is both partially exchange- 
able and restricted exchangeable with decomposition V 1 = "P^M 2 }} , V 2 = V^ 2 ^ . 

Proof. The "only if" part follows straight from the definitions. For the "if" part, suppose that 
tt, tt' G V n \ {l[ n ]} have the same multiset of block sizes. Let tt be such that, for blocks in order 
of least element, tt\ = (tti U {min^}) \ {2} and TT2 = (tt2 \ {min^}) U {2}, ttj = ttj, j > 3. 
Similarly construct tt' from tt' . By partial exchangeability fi n (Tr) = ^(tt) and ^ n {^') = Mn^')- 
But 7t',tt G 'p{{ 1 }'{ 2 }} ) so by restricted exchangeability, we have /U n (7r') = /U n (7r). □ 

Constrained exchangeable partitions were introduced by Gnedin [15]. Let ? = (ft, fc > 1) be 
a fixed sequence of integers ft > 1. Consider the set •p f ~ constr Q f partitions r G V that are 
constrained with respect to q in the sense that each block T^, contains the ft least elements 
of IJj>fc for every k > 1 with ^ 0. A measure [i on V is constrained exchangeable if 
fi(V \V"- constI ) = for some <r, and if ^ n (vr) = fj, n (n') for all tt,tt' G {r| n : T G 7??-™nstr} 
with the same multiset of block sizes and all n > 1. For ^ = (1,2,1,...), under a constrained 
exchangeable measure, tt = {{1, 3}, {2, 4}, {5}} and tt' = {{1, 2}, {3, 4}, {5}} have the same 
mass, but not necessarily under a restricted exchangeable measure. Vice versa, restrictions 
to 7 ? {w]'U+ 1 }} of a restricted exchangeable measure \x are constrained exchangeable if we take 
? = (j, 1,1,...), but as soon as fi gives positive mass to more than one "pwhv+^}}^ j > i 5 
constrained exchangeability in Gnedin 's sense fails. 

3.2 Hierarchies, ordered hierarchies, fragmentations and coalescents 

Following |31l 1331 122j . we call hierarchy of B C N any subset Hb of the power set of B such 
that B G Hb and {k} G TCb for all k G B, and so that for every A, A' G Hb, either A C A' 
or A' C ^4. To avoid trivialities, we also require G TCb- We say that a hierarchy is closed if 
for all sequences (A n ,n > 1) in Ji that are increasing for the inclusion partial order, we have 
(J A n G Hb, and if for all decreasing sequences P| A n G TLb- We say that a strict subset A C A' 
is maximal in TLb if for all A" G Hb with A C A" C A' either A = A" or A" = A'. For finite 
BcN, the maximal subsets A±, . . . , A^ of B in Hb form a partition of B and the restrictions 
HAi = Hb H j4j = {A n j4j i^GT^s} are hierarchies of j4j, i G [k]. This is not always true for 
infinite BCN and so it will be useful to note that a closed hierarchy Hb is uniquely determined 
by its restrictions Hb n [n], n > 1, as Hb = {A C B : An[n] £ Hb n [n] for all n > 1}. 

We say that a family (Q n ,n > 2) of distributions on the set of hierarchies of [n], n > 2, is 
consistent if for T n+ \ ~ Q n +i we have T n+ i n [n\ ~ Q n . Let T n ~ Q n - We introduce the partition 
n n into maximal strict subsets of [n] and refer to its distribution P n on V n as splitting rule. We 
say that (Q n , n > 2) is a labelled Markov branching model if conditionally given Il n = tt, the 
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hierarchies T n n 7Tj, « > 1, have as their distribution Q# ni pushed forward under the natural 
bijection on the set of hierarchies induced by the increasing bijection from [#7Tj] to 7Tj. 

Let P J = 'p{U]>U~H}} J j > 1. We say that a splitting rule P n is restricted exchangeable if 
for all 1 < j < re — 1 and tt,tt' G Pn, we have P n ({ir}) = P n ({ir'}). Clearly, if k is a restricted 
exchangeable dislocation measure as in Corollary [5J then 

P n (W) = K (r)/ K (P\A]), vrGP n \{l w }, re>2, (4) 

defines restricted exchangeable splitting rules and hence inductively a consistent Markov branch- 
ing model (Q n ,n > 2). The converse also holds: 

Proposition 11 All consistent labelled Markov branching models {Q n -,n > 2) with restricted 
exchangeable splitting rules (P n ,n > 2) are of the form for some restricted exchangeable 
measure k as in Corollary 

Proof. In Pitman's [29] formalism of exchangeable partition probability functions (EPPFs) 
p£(#7ri, . . . , #vr fc ) = P n (M) 7reVl = V {mj+1}} n [n], 1 < j < n - 1, 

consistency in the restricted exchangeable case (extending [26, Formula (16)]) is equivalent to 

fe+l 

pi(ni,...,n k ) =p" +1 (n, l)p{(n 1 ,...,n fc ) + ^^ +1 (ni,...,n i „i,n i + l,n i+ i,...,n fc ). 

i=l 

For any A2 G (0, 00) and (1 — p™ +1 (n, l))A n+ i = A n , n > 2, we see that k(V w ) = A n P n ({7r}), 
7r G /C, defines a restricted exchangeable measure that has the properties required. □ 

By Kolmogorov's consistency theorem, we can consider a consistent family (T n , n > 1) of trees 
Pn ~ Qn with T n +i Pi [n] = T n , n > 1, and associate Ti = {^4 C N : Af] [n] € T n for all n > 2} as 
random hierarchy of N. In the setting of Corollary we can consistently embed into continuous 
time the blocks of T n , n > 2, using exponential holding times 7?[ n ] of rate A n for state [n] and then 
recursively, rj 7Ti at rate A^. for any maximal strict subset TTi that is created when [re] splits, to 
construct consistent homogeneous fragmentation processes (F\ n (t),t > 0) in V n , re > 2. We can 
also generalise Bertoin's [8j Poissonian construction to directly obtain restricted exchangeable 
homogeneous fragmentation processes (F(t),t > 0) in V, that provide an alternative construction 
of the same random hierarchy Ji = {F{(t),i > 0,t > 0} cl , but we do not need this alternative 
construction and leave the details to the reader. We do note, however that this construction 
also provides an ordered hierarchy TC ord = {F(t),t > 0} cl C V of N. For any such construction, 
based on k, say, to be meaningful, we require A n = k(V \ P 1 !™]) < 00 for all n > 2. 

For B C N, we call ordered hierarchy of B any subset 7i orA C Vb with 1_b G TL orA and 
B G H old , such that for all r,r' G H ord , we have T ^ T' or T' ^ T, where F ^ V means that 
for every block Ti of T there is a block of T' with Ti C r^. 

We say that a family (Q n ,n > 2) of distributions on the set of ordered hierarchies of [re], 
n > 2, is consistent if for H™+ x - Q n +i, we have H^+i n [n] ~ Qn- Let W° rd ~ Q n . We 
introduce the partition II n G W^ rd \{0[ n i} that is minimal for the partial order ^ and refer to its 
distribution P n on V n as merging rule. We say that (Qn,n > 2) is a discrete coalescent model if 
conditionally given II n = 7r = (7i"i, . . . , iTk), the ordered hierarchy W° rd D {min7Tj, i G [A;]} has as 
its distribution pushed forward under the natural bijection on the set of ordered hierarchies 
induced by the increasing bijection from [k] to {min7Tj,z G [k]}. 

For j = (k- l)(k - 2)/2 + i, let V j = p(°[*-i]\W)u{0,fc}} ! 1 < i < k _ A merg ing rule P n is 
restricted exchangeable if for all 1 < i < k < n and 7r,7r' G V^, we have P n ({7r}) = P n ({ir'}). 
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Proposition 12 All consistent discrete coalescent models {Q n ,n > 2) with restricted exchange- 
able merging rules (P n ,n > 2) are of the form 

P n ({7v})=L(V^)/L(V\r M), neV n \{0 [n] },n>2, (5) 

where L is a restricted exchangeable measure on V with decomposition V° = {0^} and , j > 1. 

Generalising Schweinsberg [32], we call such measures L on V coalescent L-measures. 

Proof. The proof is similar to Proposition [TTJ In obvious notation, consistency is equivalent 

(n \ e+i 

Y.p'nlW^ ... ,1,2) pj*(m, ... ,m) + X)p«+i( ni ' • ••,% + !>•• 

and we need ^1 — X^iPl+l^l, • • • , 1, 2)1 A n+ i = A n , n > 2, to conclude. □ 

In fact, an analogous characterisation result also holds for any other decomposition of V into 
any zero-mass V° and cylinder sets V 3 , j > I, or unions of cylinder sets. 

To consistent H° rd , n > 2, we associate H ord = {V G V : T n [n] G W° rd for all n > 2} as 
random ordered hierarchy of N and also processes (F\ n (t),t > 0) and (F(t),t > 0) by embedding 
into continuous time using rates A n , n > 2, as well as Schweinsberg's [32J Poissonian construction. 



3.3 Integral representation for restricted exchangeable coalescent L-measures 

As a direct consequence of Theorem [3l mimicking Corollary we obtain 

Proposition 13 Let L be a restricted exchangeable coalescent L-measure. Then there are con- 
stants Cij > 0, kij > and a measure vij on satisfying 

[ ( 2 S io II Si - ) U v( ds ) < °°> !<^<i, 
\i, /, , m=l / 

where the sum is over indices that are distinct or zero except that io ^ 0, such that 
L= ^2 [cijSu&j} + hjS £ ij]\{i] + / K s(- n V lJ )vij(ds) j , 

l<i<j<oo ^ ' 

where Kingman's paintbox. 

4 Embedding in self-similar CRTs, proof of Theorem [6] 

4.1 Self-similar CRTs, fragmentation processes and spinal decomposition 

Aldous [2] called a pair (T, /i) a continuum tree if T is an R-tree, /i a finite measure on 7", with 

1. the measure \x supported by the set Lf (T) of leaves of T, 

2. the measure /i atomless, 

3. for every x G T\Lf(T), positive mass n(T x ) > in the subtree T x rooted at x. 
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In the sequel, we will often specify a root vertex p G T and the distance function d. For 
technical simplicity, we follow Aldous [3] and use CRTs in l\ = £i(N). We endow the set of 
compact subsets of l\ with the Hausdorff metric, and the set of finite measures on i\ with any 
metric inducing the topology of weak convergence, so that the set EI of pairs (T, p) where T is 
a rooted R-tree embedded as a subset of i\ and p is a finite measure on T, is endowed with the 
product Borel cr-algebra. 

A Continuum Random Tree ( CRT) is a random variable with values in the set of continuum 
trees. To be specific, we call distribution of a CRT (T, p, p, d) the distribution on H of the 
particular random isometric embedding in t\ obtained from a random sample £*, % > 1, of 
independent leaves with distribution p/p(T), using G l\ as the root and the ith coordinate 
direction in i\ to embed the branch leading to leaf £*, finally passing to the £i-closure and the 
weak limit of the /i(T)-multiples of empirical measures of the embedded S|, . . . , £*, i > 1. 

For x G [0, 1] and s £ S^,we denote by the distribution of the a-scaled tree (T, xp, p, x a d) 
and by the distribution of a bush of independent trees with distributions Q°, i > 1, all 
grafted to the same root. For every u > 0, consider the bush B a (u) obtained by grafting the 
connected components T"(u), i G /, of the open set {x G T : d(x,p) > u} to the same root. 
A CRT is called a-self-similar in the sense of |19| . if for all u > and conditionally given 
(/u(7; a (n)),i G J)J- = s ^ 0, we have £ a (u) ~ Q^. 

For a G R, a "P-valued process (Tl(t),t > 0) is an exchangeable a-self-similar fragmentation 
process if given II(i) = 7r, the partition II(t + s) has the same law as the random partition whose 
blocks are those of 7Tj n nW(|7T;| Q s),i > 1, where > 1) is a sequence of i.i.d. copies of 

(Jl(t),t > 0). The process X a = (\Tl(t)\\t > 0) is an S^-valued a-self-similar fragmentation. 
Bertoin proved in [5] that the distribution of an exchangeable V- valued self-similar fragmentation 
is determined by a triple (a, c, u), where v is a dislocation measure on S l , i.e. v(s x = 1) = 
and Jgi(l — si)v{ds) < oo. For this paper, we are only interested in the case c = and when v 
is conservative, i.e. v(so > 0) = 0, where so = 1 — Yli>i s i- We call (a, v) characteristic pair. 

Haas and Miermont in [19] have shown that there exists a self-similar continuum random 
tree characterized by such a pair (a,v), provided also that v is infinite. Specifically, Y a = 
((p(Tf(t)),i G I) l ,t > 0) has the same distribution as X a . 

Consider s 6 and S" G S^, i > 1. We call fragmentation of s fry the mass partition 
Frag(s,s^) given by the decreasing rearrangement of (sjS^ , i, j G N). Bertoin showed that the 
process (X a (t),t > 0) is Markovian and its semigroup can be described as follows. For every 
t,t' > 0, the conditional distribution of X a (t + t') given X a (t) = s is the law of Frag(s, S^), 
where each independently is distributed as X a (t' s~ a ), see Proposition 3.7]. 

Given a CRT (T,p,p,d) and v G T, we denote by v a {u) the point on the spine [[/?, v]] with 
d(p,v a (u)) = u and obtain a parameterisation by distance [[p, v]] = {v a (u),0 < u < d(p, v)}. 
We consider the subtree TP^Ju) = {w G T : d(p, w A > u} of T containing v rooted at v a (u), 
and its mass X^(u) = p{T^(u)). Let ^ be the self-similar time change with 

Vv (t) = inf lu > : J (X? v) (y))- a dy > t j , < t < £, = jf (y))-^. (6) 

Then = v a (r] v (t)), = ^) (%(£)) and = p(T^(t)) are the associated quan- 

tities in homogeneous time. Note, in particular, the parameterisation of the spine in homoge- 
neous time [[p,v[[= {v°{t),0 < t < ( v }. Denote by S v (t) = (SV(t),i > I) £ S l the sequence 
such that Xr v \ (t— )S v (t) is the decreasing sequence of //-masses of the connected components of 
{w G T : v°(t) G also F v (t) = Xr v \(t)/Xr v \(t—) the member of S v (t) corresponding to 

the subtree containing v. Moreover, we denote by 



Hb° (t) (t) \ 
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the associated rescaled spinal bush, of mass 1 — F v (t), at homogeneous time t > 0. 

The following lemma is a description in the CRT framework of Bertoin's tagged particle 
process that is a bit richer than often stated, but follows from the same arguments. 

Lemma 14 Let (T, p,, p, d) be an a- self- similar CRT with characteristic pair (a, v) and £* ~ p,. 
Then (S" s * , F%*) is a Poisson point process on S* x (0,1) with intensity measure v* given by 

oo 

V*(d& (g> dx) = Si5 Si {dx)u{ds). 
i=i 

Proof. Let X a be the self-similar mass-fragmentation process corresponding to the CRT (T, p) 
and X the homogeneous mass- fragmentation process of X a through the self-similar time-change. 
Extending the probability space, if necessary, denote by II a homogeneous exchangeable V- 
valued fragmentation process associated with X. Without loss of generality, we can consider 
-^(£*)(i) = | ILi (t) | by exchangeability. Let LT^ 1 ^ (£) be the partition of N such that IIi (t) = 
Erag(IIi(t— ),nW(t)), then 5 s * (t) = |IlW(t)|J-. By the Poissonian construction of exchangeable 
fragmentation processes, is a Poisson point process with intensity measure k = f sl K s u(ds). 
Hence, S^* is a Poisson point process on with intensity measure v. 

As X* is chosen according to p, it is not hard to show that (5 s , F%*) relates to S s via the 
size-biased marking kernel K*(s,-) = Yli>i s ^Si and so (5 s *, F^*) is a Poisson point process 
with intensity K*(s, dx)v(ds) = V*(ds dx). □ 

By the stopping line argument of |211 Proposition 4], this yields the following joint description 
of the ordered coarse and unordered fine spinal decompositions along the spine to E* ~ [a. 

Proposition 15 (Spinal decomposition [9[ 121] ) Let (T, p, d) be an a-self-similar CRT 
with characteristic pair (a, u) and £* ~ p. Then the process (S^* , Fs* , is a, Poisson point 

process with intensity measure 

Kush( ds ® dx ® dT) = ^2si5 St (dx)Q {su _ tSi _ uSz+u _ ) (dT)u(ds). 

Conversely, (T,p,p,d) is a measurable function of (S^* , F^* , 

4.2 A generic procedure to sample a leaf from a self-similar CRT 

Our aim is to generalise Lemma [TH and Proposition [15] to leaves other than the //-sampled leaf 
S* where we are effectively marking a Poisson point process with intensity measure v using the 
size-biased marking kernel K*(s, •) = Yli>i s «^i from to (0, 1). We will now consider other 
marking kernels. It will be convenient to adopt an idea from Pitman's EPPF formalism and 
specify the probability that a specific part of size x is chosen with probability P(s, x) so that 
the probability of choosing a mass x is K(s, {x}) = m x P(s, x) where m x is the number of i > 1 
with Si = x in s = (sj, i > 1) € S^. 

Definition 16 A measurable function P : x (0, 1) — > [0, 1] that fulfils the two conditions 

• P(s,x) = if x {si,i > 1}; 

• Ei>i P 0> s i) = L 

is called a selection probability function (SPF). 

Example 17 The SPF associated with a leaf chosen according to p is P(s, Sj) = Sj. 
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Let us now explain a generic procedure to sample a single special leaf E based on an SPF P 
from a self-similar CRT (T, \i, p, d) with dislocation measure v. 

Procedure 1 Let P be an SPF as in Definition \W\ fulfilling 

„ oo 

/ £(1 - Si )P(B, 8i)v(da) < OO. (7) 

0. We start from (7i, m, pi, di) := (T,p,p,d) and i = 1. 

1. Conditionally given (%, m, p i} di), let E(j) ~ ^. 

2. Conditionally given (TJ, E(j)), we consider the parameterisation in homogenous time of the 
spine [[pi, E(j) [[= {E9.n (i) , t > 0} and pick as % + i a subtree 5 above E°^ (i) with probability 

3. Let r(j) = inf{t > : % + i fl = 0}- We turn into a CRT with rescaled mass 
measure, root and rescaled distance function as follows: 



4. Repeat within the subtree (^i+i, IH+i, Pi+li dj+i) by increasing i by 1 and proceeding to 1. 

5. As i — > oo, we obtain a sequence (E9^(r(j)),i > 1) in T that increases in the sense that 
S (i)( r 0O) G I^' S °i+i)( r (i+i))]] and hence converges. Let E = limi_ KX , E^(rW). 

Roughly speaking, this sampling procedure is that we travel along the spine [[p, Em]] and 
keep selecting subtrees until the first time we choose a subtree not containing Em and then 
repeat inductively in the subtree until we reach a leaf E in the limit. We show in the following 
proposition that there is a spinal subordinator associated with E. 



Proposition 18 Let E be sampled according to Procedure d 

?° 

v£ mh (ds ® dx ® dT) = ^P(s, Si)<5 Si (da;)Q (sii ... )S ._ liSi+li ... ) (dr)i/(ds). 



(i) TTien i/ie process (<S S , p£, S9 S J is a Poisson point process with intensity measure 



i>l 



Specifically, [iS^ (t) , Fz(t) , B^(t)j , < i < t^)J * s a Poisson point process with 

killing rate j sl Yli^iO- ~ s i)P( s > Si)u(ds) and (accordingly thinned) intensity measure 



'V,. ! h (ds®dx®dT) =^s i P(s,s i )5 Si (da;)Q (si) ... jS ._ liS . +1) ... ) (dr)z/(ds) 



y (l),bu,.h l 

1=1 



(ii) iei £ s = — logX(£)(£)j t > 0. T/ien £ s is a pure jump subordinator with Laplace exponent 
$s o-nd Levy measure As given by 

„ oo oo 

*e(?) = J ^(l-sl)P(a, 8i )v(da) and A s = P(s, i ogai i/(ds). (8) 



751 i=i i=i 
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Proof, (i) This proof relies heavily on Poisson point process techniques. We use the terminology 
of Kingman [25]. By Proposition [151 the process (S^ 1 ) , , B%^ %) is a Poisson point process 
with intensity measure Step 2. of Procedure [1] can be read and analysed as follows. We 

mark some points of this Poisson point process with a selected subtree 1~^^(t) using the kernel 

K(s,x,B';dT") = P(s,x)5 m) (dT") + £ P(s, p' (S))5 s (dT"), 

S connected component of B'\{p'} 

where B' is short for (B', //, p', d') and T" is short for (T", //, p" , d"), also 5 for (S, fjf\ s , p\ d'| S xs) 
and {0} for ({0}, 0, 0, 0). By standard marking and mapping, we get a new Poisson point process 
(S^.F^.B-,,^,), where B^^t) = Sf E(1)) (t) with intensity measure 

^ (da:) ( P(s, (dB')*{o} (dT") + £ P(s, ^)<fe) (^')<3^ (<*T") I K<*0, 

where sW = (s\, . . . , Si_i, Sf+i, . . .) is the sequence s with Sj removed and similarly s^') is the 
sequence s with s, and Sj removed. 

In Step 3., we set = inf{i > : 7^ {0}}, exponentially distributed with rate 

/ ^2 3 i^2 P (. s ' 3 j) v ( ds ) = / X^ 1 ~ Sj)P(a,Sj)u(ds) < oo, 
»>i jV« i>l 

note ©. Standard thinning and projecting yields that ((5 s (t), Ps(i), <B/U(i)), <t< t^) = 
((S s « (t), F S(1) (i),^!™ )(*)), < t < r ( i)) is an independently killed Poisson point process with 
intensity measure Yli>i S «-P( s > s i)$s i (dx)Q- S (i) (dB')v(ds), as required for the second assertion. 
The rescaled tree T2 = ^jf x )( r (i)) ~ Qi 1S independent of this killed Poisson point process and 
also jointly independent of the pair formed by the bush ^ and the rescaled tree ^(s (1) )( r (i)) 
that has distribution Q Si for Si = F^^Jr^), using the converse statement in Proposition [T5l 

In Step 4., the induction proceeds on % ~ Qi, i > 2, all independent of the past, so 
this Poisson point process extends indefinitely, but ignores points at + . . . + tu\ , i > 
1. These are exponentially spaced and i.i.d., hence form an independent Poisson point pro- 
cess. The independence and distributional properties that we noted identify the distribution of 
(5 E (r (1) ),F E (r ( i)),Bf E) (r (1) )) = (S^) (r (1) ), ^\T^ m) (r {1) )), B{), and so the intensity measure 

Y,SiY,P(^ s 3)^M x )Q{s^))( dB >( ds ) = - s 1 )P(s,s j )5 S] (dx)Q^)(dB')u(ds), 

because Bi is obtained by grafting to the same root B T ^^(r^) ~ Qs(«,i) an d the rescaled 

7^, )( r (i)) with distribution Standard superposition completes the proof of (i). 

(ii) By (i) and standard mapping, (A£^,t > 0) is a Poisson point process with intensity 
measure As, hence £ f s = ^2 s<t A£j is a pure jump subordinator with Laplace exponent D 

Remark 19 In fact, Proposition [18] (ii) shows that for any Levy measure A with the form in 
([8]), we can find a leaf £ from this generic sampling procedure with some selection probability 
P such that the Levy measure of its spinal subordinator coincides with A. 
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4.3 A procedure to sample a sequence of leaves from a self-similar CRT 

In this section, we formulate a special inductive procedure to sample k leaves Si, . . . , E^ from a 
self-similar CRT (7~, p,) with characteristic pair (a,^), where 

j>i V>i / 

for some finite measures Uj, j > 1, representing a restricted exchangeable dislocation measure as 
in Corollary Clearly, the measures Uj, j > 1, are absolutely continuous with respect to v. We 
denote their Radon-Nikodym derivatives by fj = duj/dv, j > 1, and define selection functions 



52i>k 8 i( 1 - 8 i)M a ) 



Procedure 2 (0) To sample Si in the whole CRT (7i )0 , pi t 0, di j0 ) = (T, /x, p, d) we use 
step (k,0) for k = 1. 

(A;,0) Sample leaf E& in 7fc j0 according to Procedure [1] using the SPF P° ld . Then increase k by 
1, set B = [k — 1] and 7^ = 7", and proceed to step (k,B). 

(k,B) 1. We are provided with Sj, i G B ^ 0, embedded in T^b, and denote the branch point 
that separates the labels in B into several subtrees by v kt B, given by 

[\Pk,B,Vk,B\] = (^\[[Pk,B,^i}]- 

ieB 

2. Conditionally given (7^; Ej,i G B),we consider the spine ffc,s[[= {v k B (t),0 < 

i < ( Vk B } and pick as T k> B' either a new subtree <S above some v\ B (t) with probability 



%,B\ £ B 



or a new or old subtree S above v k> B with probability 
Tub 1 = <S 



T k , B ;V i ,ieB) = ^ B {S) Pf B (S^Ht'),F VkB (t>)), 

where B' = {i G B : Sj G 5} and new/old means with/without any Sj, i G B. 

3. Let r fciB = inf{t > : T kjB > n 7^ fi (i) = 0}. We turn 7~ feijB / into a CRT with rescaled 
mass measure, root and rescaled distance function as follows: 

Pk,B\r kB , . dk t B\r k B ,xT k B , 

Pk,B> = frr ' r , Pfe.B' = V kB {T kt B), Q-k,B' = 7 7= rTTF- 

Pk,B{-ik,B') {Pk,B{Tk,B')) 

4. Repeat within the subtree (7^^ , [i k ,B'i P k ,B'i dfc,B') by proceeding to step (k,B'). 
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Prom Proposition 1181 we obtain the following by straightforward arguments. 



Corollary 20 Sample (S^, A; > 1) following Procedure^ Let v k be the branch point in T that 
separates [k] into different subtrees, k > 1. Then (Ys E *(i), Fs fc (i),23^ s ^(in ,0 < t < ( Vk ^j is a 

Poisson point process with killing rate Xk = f s i Yli^i Y2i=i s i0-~ s i) u £(ds) and intensity measure 

oo oo 

v£l h {ds®dx®dT) = ^^(dx)Q (sii ... )Si _ ljSi+lj ...)(dr)^4(l- Si )^(rfs). (9) 

i=l i=k 

Note that Ai = 0, so in this case, the Poisson point process is not killed and Corollary [20] 
describes the whole tree T jointly with Si, decomposed along its spine [[p, Si[[. For k > 2, 
Corollary [20] describes a spinal decomposition along [[p, Vk[[, except that the subtrees above Vk 
are not described. We will do this in Lemma [2T1 below, using more refined arguments. 

Proof. The case k = 1 follows straight from step (1,0) of Procedure [2] and Proposition 1181 We 
then proceed by induction in k. Assuming that the statement is true for k, step (k + l,[fc]) 2. 
and standard thinning with probabilities P j ^ 1 (s, Sj) yields 



\ds ®dx® dT) = ^P^ d 1 (s, Si )5 Si (dx)Q (si) ... )Si _ liS . +li ... ) (dr)^ S f(l - Si )u e (ds) 



-0+1), 

1=1 

as claimed, and an extra rate f sl X^>i(l ~ -^fc+i( s > s i)) YlT=k s i(^ ~ s i) v z{ds) is added to the 
killing rate Xk from the induction hypothesis. This completes the induction step. □ 

To identify the distribution of (T; i G [A]) constructed according to Procedure [2] run 
up to some k > 2, we study its branching structure recursively by specifying the first branch 
point Vk that separates [k] into several subtrees denoted by with label partition 11^ and 
a remaining bush B\ k ] of unlabelled subtrees, with joint relative subtree sizes SW G SK For 
x G (0, 1] and -B = {bi, . . . , bk} C N with 1 < &i <...<&&, it will be convenient to denote 
by Qx the distribution of a rescaled and relabelled version of (T;Sj,z G [A;]), where the mass 
measure has been multiplied by x, the distance function by x a , and Sj is renamed to S^, i G [A]. 

Lemma 21 The first branching of(T; S,,i G [k]) separating [k] and associated subtrees described 
inC% = {SW,IlW,TW,B [k] ) are independent of Cf e = ((S^(t),F Vk (t),B° {vk) (t)),0 < t < ( Vk ), 
with distribution given by 

W(S [k] G da, = 7T, (7i [fc] ; Si, t G tti) G cffi, . . . , (T r M; S i; i G 7r r ) G dT r , B [k] G dB') 
= y I E Qg(ii....,«r) (dB 7 ) fl sf^Ql-, ) ^(ds), 

distinct £=1 / 

where ir = (tti, . . . , 7r r ) G Vk and m = min7r2 — 1, also ^(*i>— >v) i s s ^^/j Sl) _ _ _ ^ s ^ removed. 

The kernel ^(dTi ® • • ■ ® cff r d£?') = distinct QjCn,.^) {dB') ELLl 4^QV h (dT e ) is 

a fancy paintbox that equips each block under k s with a tree and embeds the labels for ir G /C. 

Proof. For fe = 1, this is trivial since v\ = Si is a leaf. Now suppose that the result holds for 
all [j] f= [k], and consider k + 1. In our use of standard Poisson point process arguments as well 
as in extracting from Procedure [2] as from Procedure [U we build on the proof of Proposition [TBI 
For 7T G Pfe+i \ {l[fc+i]}) let A n = {n[ fc+1 l = ir} be the event that Vk+i splits [k + 1] into ir. 
The simplest case is for tt = {[k], {k + 1}}. By Corollary [201 the decomposition of T along the 
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trunk [[p, v k [[ is given by the Poisson point process k (t),Fj} k (t), £>^ s ^ (t)J , < t < ( Vk J with 

intensity measure ([9]), killed at rate X k = J sl Yl'uLi Yle=i s f(l ~~ Si)ve(ds). By comparison with 
the statement of Corollary [20l for k+1, we see F(A^^ k+1 yy) = 1 — Xt/^k+i- Conditionally given 



A {[k],{k+l}}, the distribution of (^H^+i.Ifc]), ^s^r fc+1 j fc] ), ^^(r^^^), T^' k] (T k+1>[k] )) is 



^ ZTT E E P " W ( S ' s - a i)*»< ( dx )Qsi^ {dB')Q S3 (dT") 4(1 " 8i)ut{da) 

oo 

= - Y,Y, 5 ^ dx )Qs^AdB')Q S] (dT'')sf Sj u k (ds), (10) 

fc+1 fe i=i 

independently of the rescaled {T^ k )( T k+i,[k})'} * ^ \M) that ^ as ^1 as conditional distribution 
given Note also, that the embedding of T, k+ i in the rescaled ^{T k+ i t [ k }) yields 

conditional distribution given A^ Jk+i}}} and that by standard thinning arguments 

these are conditionally independent of ^5 Efc+1 (t), Fj] k+1 (t), i3° 2fe+ ^(t)^ ,0 < t < Cu fc+ iJ given 
^■{[fc],{fe+l}}- Multiplying by F(A {[k]){k+1}} ), this yields the result for % = {[k], {k + 1}}. 

Now consider any other ir = {tti, . . . , ir r } G T^^+i \{l[A;+i] } an d write m = min7r2 — 1 G [fc — 1]. 
Note that also m = min7r2 fl [k] — 1. By the induction hypothesis, the collections C k Te describing 
the spine to the branch point separating [k], and C k r describing the branching and rescaled 
subtrees, are independent. We read and analyse Step 2. of Procedure [2] by marking C k re as we 
marked the Poisson point process in the proof of Proposition [18] and similarly and independently 
selecting a new or old subtree S above v k with probability 



%,B] Sj, Z G B 



Hk,B (<S) 



Then A n is an intersection of two independent events A^ = A^f e n A^ T given by 

A T = {%f = {0} for all < t < CvJ and A^ = {C k (T sel ) = vr (fc+1) n [k]}, 

where C k (S) = {i G [k] : G S} and 7T(fc+i) is the block of it containing k + 1. By construction, 
(C k ve , A v k re ) and (C k r ,A^) are also independent and, since the random variables used to embed 
in T scl are conditionally independent of (C k m ,A r k c ) given T sel , also C k r +1 is independent 
of (C k r ,A^ r ), hence of C k r +1 , since on A^ r , we have C k r +1 = C k r . The distribution of C\ r +l now 
follows from the conditional distribution of T sel given C k r , the recursive nature of Procedure [2] 
and the stability of the procedure under increasing bijections from [j] to other sets BcN with 
#B = j that allows us to apply the induction hypothesis to obtain that the embedding of 
in the rescaled T sel ~ Qf yields a tree with rescaled distribution Qf u ^- k+1 \ as required. □ 

Corollary 22 (Subtree decomposition along a reduced tree) The discrete tree shapes T k , 
k > 1, of the reduced trees R(T; Si, ... , T, k ), k > 1, are labelled Markov branching trees with 

P(n[ fe l = tt) = — / K s {V*)v m (ds), where m = minvr 2 - 1. (11) 

Conditionally given T k , the processes ((S B (t),FB(t),B B (t)),0 < t < (b), B G T k , where we 
parametrise {v G T : C k (T v ) = B} = {i^g(i),0 <t< £g} in homogenous time, are independent 
with distributions as in Corollarv \2(K pushed forward under increasing bijections [#B] — > B. 
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Conditionally given Tf., in particular H B = ir B = (irf , ■ ■ ■ , ir B ) with m B = min7r^ — 1, the 
variables (S B ,F n B, . . . ,F n B ,Bb), B G T k , are independent of everything else with distribution 

X^ R F(U B = 7r) ( 2 Qs(n,-^(dB')f[st n % ie (dx e ) J P mB (ds). 
* V ; distinct t=\ J 

The tree (T; Si, . . . , with k leaves embedded via Procedure^ is a measureable function of the 
random variables (T k ; ((F B , S B ,B B ), ((S B (t), F B (t),B° B (t)),0 < t < ( B )),B G T k ) specified. 

Proof of Theorem [6l We will show that Procedure [2] provides an embedding for a restricted 
exchangeable hierarchy as in Corollary [5l provided that J*gj(l — s\)v{ds) < oo and z^(so > 0) = 
Cj = kj = 0, j > 1. 

A restricted exchangeable hierarchy is uniquely determined by its restrictions to [k], k > 1. 
But the formula for k in Corollary [5] is identical to (jlip , hence the hierarchy constructed via 
Procedure [2] is a restricted exchangeable hierarchy associated with (uj,j > 1) embedded in a 
CRT with characteristic pair (a,v), as required. □ 



5 Scaling limits, proof of Theorem [7] 

5.1 Asymptotics of block numbers in Gnedin's constrained partitions 

Before we describe Gnedin's framework and provide a slight extension of his asymptotic study, 
let us establish the renewal theory result that we need for this. 

Lemma 23 Let Nt = #{n > 1 : X\ + . . . + X n < t} be the renewal process associated with 
independent and identically distributed Xj > 0. Then for all p G N 

hmsupE < oo. 



This is not a deep result, but we have been unable to find it in the literature and hence provide 
a proof here. This can no doubt be strengthened to a pih moment renewal theorem extending 
the first moment Elementary Renewal Theorem, but we will not need such a stronger statement. 

Proof. The case p = 1 follows directly from the well-known Elementary Renewal Theorem for 
E(iVt). To prove the general case inductively, we define qj(t) = K(N^) and consider the strong 
induction hypothesis: for all t > 0, 

3 

Qj{t) < a jk{<li{t)) k f° r all 1 < j < p — 1 and some aj k > 0. (12) 

k=l 

This is trivially true for p = 1 and p = 2. Let F be the distribution function of X\ and U be 
the renewal function i.e. 

'l + qi (t) t>0, 
t < 0. 



U(t) 



To show the induction step, we condition on the first renewal time X\ and obtain the renewal 
equation 

p-i 



1p = F + J2 ( P ) q i * F + q P * F 

j= i \ J y 
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where * denotes convolution, i.e. V * W(t) := JnV(t — s)dW(s), in the sense of Stieltjes 
integration. Let .F*( m ) be the distribution function of T m = X\ + . . . + X m , m > 1. Note that 



< rn p F< m \t) = Y^{nk + j) p F< nk+j \t) < ^ k((n + l)k) v F< nk \t) 











oo 


E[iyf] = e 


E Vm<*} 


). 


= E 


E 




V m>l y 






mi=l 


oc 






k 





771=1 
oo 



n=0 j'=l 



n=0 



< J^A;((n + l)A:) p (F*W(t)) n , for all A; > 1, 



n=0 



where the last step used the monotonicity of G = F^' in a simple estimate of the form 

^Q*(r~i) t G ^ = /"* G*(r-i)( t _ s ) dG ( a ) < /^Gf^ _ s ))- 1 dG(s) < {G{t)) T , (13) 



using induction in r, but which is also probabilistically obvious in its interpretation in terms of 
random variables. Choose k large enoue h such that F< k \t) < 1, then we deduce t i— > g p (t) is 
locally bounded. Therefore the renewal equation has as its unique locally bounded solution 



q p = F*U + Yl^jq j *F*U, 



and particularly q\ = F * U. Then using the induction hypothesis and we obtain 



*,<^+E(-) (tw^^) ^+E ( 



Applying an argument like (fT3|) to G = gi and r = + 1, we see that the induction proceeds. 

As the Elementary Renewal Theorem guarantees fi = limsup^^ qi(t)/t < oo, this completes 
the proof, since now 



lim sup E f — — j = lim sup ^ ^ < lim sup a pk - 

t^oo t— >QO , . 



(g ift))* 

tp 



a pp [l P < OO. 



□ 



Gnedin [15] introduced a constrained paintbox based on a strictly decreasing random sequence 

(Gk,k > 0) in [0,1] with Go = 1 and lirm ; _ >(X) G& = 0. Specifically, he considers a sequence 

(In: n > 1) °f independent uniform random variables on [0,1] independent of (G/%), but then 

j) 

associates a modified sequence (I n ,n > 1) that is constrained so that its lower records follow 
(Gfc, k > 1) with multiplicities given by a sequence tp = (tpk, > 1): 

• set if = . . . = Ijj = Gi, then we have n = ip\ modified variables, K$ = 1 record has 
attained its multiplicity according to tp and the next record has been attained Fit = 
times; 

• given (if , . . . , 1^), Kn = k > 1 and i?^ = r G {0, . . . , V'fc+i — 1}> proceed as follows 



if I n+ i € [G fc , 1], let I n+1 = I n+ i, K% +1 = Kn and R% +1 = R% 



- if I„.+i e [0, G fc ) and r < tp k+1 - 2, let J n+1 = G k+1 , K% +1 = K% and R% +1 =R% + 1; 



- if I n+ i € [0, G fc ) and r = ^ k+1 - 1, let T^ +1 = G fe+ i, =K% + 1 and = 0. 
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Eventually each Gk will appear ipk times as lower record in (J n , n > 1). Let J, 
be the number of records attained by the re first terms of the sequence. Gnedin obtains the 
asymptotics of Jn when Gk = Y± ■ ■ • Y&, where for k > 1 the Y&, > 1, are independent and 
identically distributed in (0,1) with finite logarithmic moments E[— logYi] and Var(— log Y"i). 
Here we drop the requirement of the finite logarithmic moments. 

Lemma 24 Let Gk = Y\ - ■ - Yk, where the Yk, k > 1 are independent and identically distributed 
in (0, 1). If ip = (V'fej k > 1) is such that tpk £ N, k > 1 and 



{Rt>0} 




o(k), 



as k —¥ oo, 



then 



■J n 



1 



lim 

n— too log n E[— log Yj] 



in the sense that this limit vanishes when E[— logYi] = oo. Furthermore, for every p > 1, 

lim sup E 



logn / 



< oo. 



Proof. The case Var(— logYi) < °°, and implicitly also E[— logYi] < oo, has been shown 
in the proof of |15l Proposition 8]. We only need to handle the case when E[— logYi] = oo. 
Following Gnedin, we define J' n = #{k > 1 : G k > 1/n} = #{k > 1 : Ei=i(- lo g*i) < logn}. 
According to the Renewal Theorem |13[ Theorem 4.1, Chapter 3], J^/ log re — > a.s. when 
E[— logYi] = oo. Let I\ )n < ■■■ < I n , n be the order statistics of Ii,...,I n . Define C, n by 
Ic, n ,n < 1/™ < Ic, n +\,n- According to Gnedin's discussion, J' n and ( n are independent, ( n is 
binomial(re, 1/n) and Jn < J' n + Cn- By Markov's inequality, we have for all e > 0, 



P(C„ > e log re) = P (e 2 ^/ £ > n 2 ) 



< 



E [e 2? " /e 



n- 



1 



1 + 



3 2 A-1 



Thus, we have E^Li P(Cn > elogn) < oo. The Borel-Cantelli Lemma now implies that 

lim Q n / log re = 



a.s. 



This gives us limsup^^^ jt /log re = when E[— logYi] = °o- 
For every p > 1 , note that 

P 



E 



Jn 

log n 



< E 



Jn + Cn 

log n 



< 2 P ~ 



E 



P 



log re 



+ E 



log n 



The first term is bounded due to Lemma [23] and the second term converges to because the 
moments of of Q n are bounded. □ 



5.2 Special branch points and their asymptotics 

We consider the setting of Theorem [3 where for some fixed m > 1, we have Vj = v m for all 
j > m. In this setting, the selection probabilities of Section [4.31 for k > m + 1 become 

P^ d (s, Si ) = Sl and P£™(s, Si , Sj ) = Sj . 

It is now easy to see that the sampling procedure in (T, //) can be simplified in this setting so 
as to combine for each k > m the steps until j^B' < m into a single selection according to fi. 
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Procedure 3 Use the steps of Procedure El but replace for k > m steps (k,[k — l])l.-2. by 
1'. We are provided with Sj G [fc]) and sample E|,^ ~ /i. 

2'. We consider the spine [[/?, E£ +1 [[= {T,^ +1 (t),t > 0} in homogeneous time and set 

%,B> = 3^.)^), where r fc * = inf{i > : )(*)) < = } (r fc *)), 

where Ck(S) = {i G [A;] : Ej € 5} is the number of labels in S C T. 

Theorem [7] describes the convergence of unlabelled trees. In fact, more is true and it will be 
instructive to study approximations of the spines [[p, j > 1, in (T;Ej,z G N) by discrete 
spines {.£> G T n : j G -B}, n > j > 1. In the proof of Theorem [7J we will need to control these 
uniformly in j > 1 . In the exchangeable case of [20] , these spines can be regarded as independent 
uniform samples from a strongly sampling consistent regenerative interval partition. In the 
restricted exchangeable case here, the analogous partitions are no longer regenerative (except 
for j = 1, and for j = 2 if m = 2) and the sampling is not uniform. However, both features are 
still present on parts of the spine and we will cut the spines at certain special branch points. 

Fix j > 1. A branch point v G [[p, Ej[[ is called special in (T; Ej, i £ N) for [[p, Ej[[ if some 
or all of the m smallest labels C(T V ) in the bush T v above v are not included in the subtree 
T% Jd(p,v)) above v containing Ej. Note that a branch point is special if the m smallest labels 
split or if j splits from the m smallest labels. Therefore, a branch point that is special for [[p, Ej[[ 
and an element of [[p, Ej/[[ for some j' < j may not be special for [[p, Ej/[[. For the analogous 
notion in (T; Ej,i G [n]), for n > j, we write 

= #{v G [[p, Ej[[: v is a special for [[p, Ej[[ in (T;Ej,i G [n])} 

for the number of special branch points, and = inf{t > : #£ n (7fy jt)) < m} for the 
homogeneous time when the label set first has fewer than m elements. The significance of this 
time is that up to this time, all branch points that are special in (T; Ej, i G N) will also be special 
in (T ; Ej, i G [n]), but this fails afterwards. We introduce = inf{t > : E n 7^, .)(£)}, the 
homogeneous time when E n leaves the spine [[p,Ej[[. Recall A*(z) = v*(S l x (0,e~ z )) and also 
set Afc(2;) = ?bush(^ x e_Z ) x A; > 1, in the notation of Lemma [lH and Corollary [20l 

Proposition 25 Xei > 1) and v be as in Theorem^ and (T;Ej,i G N) an embedding 

according to Procedure \M Suppose furthermore that there is m > 1 with Vj = v m , j > m, and 
thatv m (s\ < 1 - e) = e~ a £{l/e). Then, 

(i) for all j > 1, we have N^ j) /(n a £(n)) 0; 

n — >oo 

(ii) for every p > 1, we have limsupE (n^/ log n) < cxo; 

n— >oo L\ / . 

(iii) for every p > 1, there exists a constant Cp pec such that for all 1 < j < n and x > 

'n& > 2xmax{Ai(n- 1 ), A^n- 1 )}) < 



Cp 



spec 



x p n ap- 



-1 ' 



Proof, (i) Let us consider Nn first. We will study the asymptotics by relating this to the 
setting of Lemma [2H and will use the notation there once we have identified (Gk, k > 1) and ip. 
Recall that A (El) (U. (1) ), i > 2, are the residual masses of the subtrees containing Ei when Ej 
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has left the spine [[p, Si[[. Let Y k , k > 1, be independent copies of -XWy (r^ ) , the residual mass 
of the subtree containing Si at the branch point separating [m], and G k = Y± ■ ■ ■ Y k , k > 1. 
We introduce the filtration 

Jf\t) = a ((^(a),^ W.B^jW, A,^)))^ < , * > 0, 

of the spinal Poisson point process , Fg 1 , studied in Proposition [181 augmented by 

label sets of spinal bushes derived from embedded leaves Si, ... , S n . 

Let Hn^ = j^{r^ ,m < i < n}. Then = 1 is the initial state, we will also consider 
(T^\x m {T^ ) )^C m {T^ i) (T^ ) ))). Now let n > m + 1 and write Vf = minjr^, v}?\ 
n > 1. Conditionally given ^(r^), in particular (X (Sl) (F^), . . . , X^V^)), = k 

and #£ rt _i(7^ i ^(r^j > 1 )) = £, the argument to establish Procedure [3] can be used to simplify 
Procedure [2] to only combine the steps until 1 £ B' or j^B' < m; so sample a leaf S* ~ /i, define 
V$ = inf{t > : 1 £ £ n _ x (T ( ° ,,)(*))} and 

- if <i < r^i, set T n _ hB , = Tfc^vg}), note = k, r' 1 ' = r£l v #£„(T ( ° ^r^)) = I; 

- if V$ > r^i and £ < m, set T n _ liB , = Tg^r^i), note = k, tP = t£} v 

- if Vn$ > t^} x and I = m, then sampling of S n in the rescaled subtree / ^Si)( T 7i-i) * s 
independent of ^"^-l^n-l) an( ^ by the same procedure clS S m IS sampled in T, therefore 



note ffP = fc+1, rW-r^i = independent of ^(r^), and #£ n (T ( ° E) (ri 1) )) < 



m. 



The third case suggests to consider (^k,k > 1) = (m — #'Ciy fc+1 (^"(s 1 )('?"^ +1 )) 5 ^ > l)i where 
W k = inf{n > 1 : = k}, independent of {G k , k > 1). As (G k , k > 1) = (X (Sl) (F^), fc > 1), 

it is now straightforward to show that the dynamics of and J^-m+i are the same, hence 
there exists a sequence {h,i > 1) of independent uniform random variables on [0,1] and an 
independent random sequence each member taking values in [m] such that for all n > m 

(Hg\..., H M)±(j?,...,J*_ m+ l)- (14) 

Now note that n £ N with W k < n < W k +i can only yield a new special branch point if 
V^ 1 * > t^_i, i.e. in the middle case of the procedure above, but after at most m — 1 such steps, 
the third case will apply and Hn^ will increase. Therefore, 

Njp <m#HW. (15) 

Lemma [231 ensures H$/]ogn-> l/E[-lo g yi], therefore /(n a l(n)) a.s. as n — > oo. 

Now consider Nn ■ For j > 2, let Uj = S^(A^) be the spinal branch point at homogeneous 
time A° = inf{i > : Cj(T^^(t)) = {J}}, when j becomes the smallest label of its block; 
note that Si, . . . , S 3 _i are in spinal bushes of [[p, Sj]] below or at Vj. Clearly #£ n (7^ .^(Aj)) < 
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n - 



+ so the number of special branch points on )]vj, T>j]] will be no larger than N^]_j +l where 



(N^\k > 1) = (N^ L> ,k > 1). The number of special branch points [[p, Vj]] is N^' . Therefore 



r(l) 



(.7) 



ArO') < N {j) + iV {1) • , < 7 + iV (1) • 



+1- 



(16) 



Hence the convergence for /(n a l{n)) follows. 

(ii) To study Nn , we will identify new families (Gk,k > 1) and ifi different from the ones 



in (i) and again apply Lemma [2^1 Let = ^(xi') be the first special branch point in the 

spine [[p, Sj]]. By Procedure^! X(£-)(Xx^) = ^{T,*){Xi^) f° r an j > m + 1. Also, note that Xi 
is determined by (T; G [m]; Sp. As X* is sampled according to /i in T, we have 



Of,,(i)> 



(m+l)> 



Xm+lh 



(17) 



Let Ifc, fc > 1, be independent copies of Js m+1 (Xi™^) an d consider a constrained painbox 
associated with Gi~ = Y\ ■ ■ ■ Y^, k > 1, also ^ = 1, k > 1. We claim that for all n > m + 1, and 
every x > 0, 

P(ATW - m + 1 > x) < F(Jt m > x). (18) 



This formula holds for n = m + 1 as N, 



(m+l) 
m+l 



m + 1 < Jf = 1. Suppose (fTHj) holds for all 

do) 



n < j — 1. For n = j, the first special branch point b± on the spine [[p, Sj]] is located on 
the spine [[/?, b^]]. For z = m + 1, . . . , j — 1, let 7^ ^(V^ A Xi ) be the spinal subtree of T 
containing Xj rooted on a branch point on the spine [[p, b^]], possibly at 6j itself. By Procedure 



El S* € 7"(2.)(v A xi )• We can express the number Mf of leaves in {X 

(7) 

belonging to the subtree containing Xj above branch point b^ as 



m+l j 



M 



(.7) 



#{i G {m+l, . . . : Si G ^(xHI = #{i G {m+l, . . . , j-1} : X* G ^(x^)}- 



tor 



As are sampled according to p and ) = ^(s ra+1 )(xi m ) = Y 1: by 

(Hzd, 



E 
E 



j — m — 1 
j — m — 1 



^.)(X?)) (1-%)^) 



j— m— A:— 1 



for all < k < j — m — 1, where M ? _ m is the number of , . . . , / 7 _ m hitting the interval (0, Gi). 



Let Nj (Xi\°°) = -^i^ — 1 be the number of special branch points in ]]bf Ej]] 3 and 
^/_ m (0, *i) = Jf_ m - 1- Given M[ j) = k, we have #£j(T ( ° .^X?)) < k + m < j - 1. Hence, we 



(.7) 



obtain by the induction hypothesis applied to the rescaled (T^.^Xi )) Si, i G #Cj(T^ .^(Xi^))) 



N^\xx\oo) -m+l> x 



M 



(j) 



< 



«/;_ m (o,y 1 )>x 



fe), 



and then 



U) 



m + 1 > x 



= E 
< E 



N^\xi\oo) -m + l> x-1 



Al- 



ii) 



jf_ m (0,Y 1 )>x-l 



Jj- m > X 
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Now ([TBI) is clear and we deduce that E 



for every p > 1. 



L (7vr ; - m + iy 

The result in (ii) now follows from Lemma [241 

(iii) Formula (|16p implies that for every p > 1 and x > and z n = x max{Ai(n _1 ), A*(n -1 )} 



(N® > 2z n ) < P (n® > z n ) + P > z„) 



E 

< — 



+ 



< C p (Iogn)P 



pi p — p 

■2n 2n «n 



The last line is obtained by Markov's inequality. Lemma 1241 and the result in (ii) gives the upper 
bound for the first probability, while (|15j) together with Lemma [241 gives the upper bound of the 
second one. As Ai(y) ~ y~ a £(y) and A (y) ~ y~ a £{y) as y j 0, the result in (iii) follows. □ 

Procedure [3] and the notion of special branch points are also useful to show that the embed- 
ding uses the whole CRT (T, p) and does not leave any subtrees of positive mass unlabelled. 
One way of making this precise is to say that the reduced trees converge to the CRT: 

Proposition 26 In the setting of Procedure [3J we have 

R(T; Ej, i € [k]) —* T a.s. in the Gromov-Hausdorff sense as k — > oo. 

Proof. Let e > 0. Consider [[p, Si[[ and the associated spinal mass partition [21]. Here we 
denote by f| p the distribution on of the masses of spinal subtrees that are greater than e. 
Let = inf{t > : ptfLAt)) < e}. Note that Wi := inf{n > 1 : T n 1] > a { £ 1] } < oo a.s., by 
the previous proof. By Procedure^ leaves E* and E n are in the same subtree of [[p, E5(cr e )]] for 
each n > W±, in particular each subtree of mass greater than e is selected with an asymptotic 
frequency greater than e. Inductively, we use Corollary [20] and leaves selected according to 
Procedure [3] to further split according to scaled f| p each subtree of mass greater than e. 

After a finite number of steps, all subtrees have mass less than e, e.g. because a homogeneous 
mass fragmentation process (Ft,t > 0) in with finite dislocation measure z^| p satisfies Ft — ► 
as t — ► oo, see e.g. [3 Equation (4)], and so only has finitely many splits before |iq(i)| < e. □ 

Using arguments of [301 Corollary 23], we can also show joint a.s. convergence in the Gromov- 
Prohorov sense of weighted trees (R(T; Ej, i £ [n]), n~ l Y^l=i ^sJ ~~ > (T, m)- 

5.3 Convergence of reduced trees and large deviation estimates for spines 

By Corollary 1221 reduced trees i?(T;Ej,i G [k]) of self-similar CRTs with labelled leaves em- 
bedded according to Procedure [3l can be assigned subtree masses on edges (parts of spines) in 
terms of Poisson point processes and associated spinal subordinators, and away from existing 
leaves, sampling of new leaves is according to subtree masses. To study the asymptotics of the 
number of spinal branchpoints, we will need the following refinement of results in |16[ I20j. 

Lemma 27 Let £ = > 0) be a pure jump subordinator with Levy measure A satisfying 

A([x,oo)) = x~ a £(l/x), x I 0. Let (e,r, r') be any random variables on [0, oo) 2 x [0, oo] with 
t < t' . Let (Vi,i > 1) be any random variables conditionally independent given (£,£, r) with 

P(Vi < t|£,£,t) = 1 - e' e and P ( V { > r + v\ f, e, r) = e' £ ~^ v , v>0, 

and K n (e, r, r') = #{Vi : 1 < % < n,r < V± < t'}. Then 

lim o,oi \m V = / exp(-a(e + £ v ))dv a.s. as n -> oo. 

n^oo n a i(n)T(l — a) J 



24 



If furthermore A([xy, oo)) < C\y~ Q A([x, oo)) for all y > 1 and < x < 1, and some o > 0, i/iera 
iaere is a constant C p for all p > 1/a, swe/t t/iat /or aH i > 1, n > 1 and aZZ (gr, r, r') as above, 
but with the additional property that t' = t + r" /or a stopping time t" for a filtration, in which 
£ is a subordinator, 



\n a £(n)r(l — a) J 



Cp 



-i ' 



(19) 



where Y{e, r, t') = 1 + (1 + ^ Q )C A ^ exp(-o(e + £)) twtt A a = 2^ ^ ■ 

i=o j=i J ^ + - 1 

This lemma is an extension of [201 Lemmas 8 and 12], which we recover as the special case 
r = e = and/or r' = 00. The proof is also essentially the same, but since this result is more 
general, we reproduce the proof rewritten in the present generality in the appendix. 

Proposition 28 Let v\, . . . , v m be conservative with v{s\ < 1 — e) = e~ a £(l/e), where v is as in 
Theorem^ with Vj = v m , j > m. Let R(T, Si, . . . , S n ) be an R-iree sampled from a self-similar 
CRT (T,p) with index a and dislocation measure v by Procedure^ let (T n ) n >i be the associated 
labelled discrete restricted exchangeable Markov branching trees with unit edge lengths. Then 

' '' ' ' . R(T , Si, ... , Sfc) in the sense that all edge lengths converge. 



n a £(n)T(l - a) n^oo 

In particular, the delabelled trees (R(T n , [k]))° , n > k, converge in the Gromov-Hausdorff sense. 

Proof. Consider k = 1 and denote by Dn the length of R(T n , {1}). 

If v\ = ■ ■ ■ = v m —\ = 0, then Si, . . . , S m are always in the same subtree in T, then t[ = 

■•• = Tm' = 00. Conditionally on the subordinator £ Sl associated with leaf Si, the leaves 
S m+ i, . . . ,S n are sampled according to p along the spine [[p, Si[[. Hence applying Lemma [271 
the convergence result is straightforward. 

Now suppose that at least one of v%, . . . , v m -\ is non-zero. By Procedure El each S, is either 
placed in the same subtree of [[p, Si[[ as S* ~ p or contributes a special branch point. Now 

D ( n ] = # 1 < i < n] < 1 + JVW + # [vff, 2 < i < n} , (20) 

with = inf{t > : 1 C n -\(TfL»\{t))}, where Lemma [271 yields the asymptotics of 

^i,-l(0,0,oo) = #{V^* , 2 < i < n}. Together with the asymptotics of Nn^ obtained in 
Proposition [25], this yields 

D$ f°° s 

hmsup -</ exp(-a^ 1 )dt a.s. (21) 

n-*<x n a £{n)T{\ - a) J 

On the other hand, no special branch points are created for n>/ + l>m + 2 below , so 
D$ > # {v} 1] : < < r ; (1) , I + 1 < i < n} = # {v^ : < f£ } < r, (1) , J + 1 < % < nj . 
At least one of Vj 7^ 0, j < m — 1, so r„ < 00. By the proof of Proposition 1251 — > 00, so 

liminf — ^ ——, r> sup liminf 1 ' — ——, r = / expf— a£+ )dt. 

n-,00 n a £(n)T(l - a) ~ ,> m +i n^oc n a £(n)T(l - a) J PV ^ ' 

Combining this with (|2ip . the convergence for follows and establishes the result for k = 1. 
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Next, consider k > 2 assuming the result for 1, . . . , k — 1. For the branch point adjacent to p 
in R(T, Si, . . . , S&), set = d(p, Vk), with homogeneous time ( Vk given by 



Jo 



Let Dn^ be the height of the branch point adjacent to the root in R(T n , [k]), then D$ — 1 is the 
number of distinct branch points of R(T, Si, ... , S n ) belonging to [\p, Vk[[, i.e. 

D$ = 1 + : < < ( Vk ,k + 1 < i < n}. 

If 2 < k < rre, then 1 < Din < m — 1 and, by the same argument as for k = 1, 

*ffJ m (0,0,C„J = : V™ < Cv k ,m+1 < i < n} < <m + K n k } m (0,0,Cv k ). (22) 

If k > m + 1, then £>jf ] = 1 + #{V$ : vff < ( Vk , k + 1 < i < n}. In all cases, by Lemma [27] 

n [k] f<*k 

J-^n a.s. / 



n«£(n)r( 



r / exp(-<f 1 )d S = I>W. 

1 - «) n-+oo _/ 



So the renormalized length of the root edge of R(T n , [k]) converges as required. 

Now argue conditionally given that [k] is first separated into = (irx, . . . , w r ). For all 
> k + 1 and 1 < j < r, denote by Bj(n) = C n {T^) D ttj the jth block of the partition at 



n 



in (T; Sj, i € [re]), and by T n k j the corresponding subtree of T n . By Lemma [21] Procedure [3] and 
the Strong Law of Large Numbers, 

#Bj{n) a.s. (rr [kK . . 

n n— >oo J 

and the Induction Hypothesis yields convergence of the remaining edge lengths, for 1 < j < r 



n a £(n)r(l-a) n Q £(n) (#Bj(ji))H(#Bj{n))Y(l - a) 

(M^^m^jS^Gvr,), 

n— >oo J J 

in the sense that all edge lengths converge, which implies Gromov-Hausdorff convergence. □ 

While the arguments of the analogous but much more specific |30] Proposition 22] do not 
apply here in cases where the densities = dv^jdv are degenerate, we can now deduce from our 
Proposition [26] that in the setting of Proposition [28] here, delabelled trees converge a.s. when 
taking double limits 

(R(T [^]))° 

lim lim — „ . n l, = T in the Gromov-Hausdorff sense a.s. (23) 

fc^oon-oo n a i{n)T{\ - a) 

Theorem instead of restricting to [k], then letting re — > oo and then k — > oo, considers n —* oo 
directly, at the cost of weakening the mode of convergence to convergence in probability. To 
prepare the proof of this otherwise stronger Theorem we study the spines [[p, j > 1. 

Recall that we denote by Ai and A* the Levy measures of the subordinators £ Sl and £ s * 
generated, respectively, by the first embedded leaf Si and by a leaf S* sampled according to p. 
For k > 1 and n > k, denote by D n the length of R(T n , {k}). 
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Lemma 29 For all p > 0, there is a constant C' > such that for all k > 1, n > k and x > 1 
f(l>W >2(l + x)(2 + Z fc )max{A 1 (n- 1 ),A*(n- 1 )}) < Cp 



x p n ap- 



where Z k = m + (1 + A a ) max-fC^, Ca*} I 771 + (-^(s fc )(^)) e J a ^ moments finite. 

\ i=0 J 

Proof. For fc = 1, we use ([20]) to write D$P < 2(£>1 1) - 1) < 2ivP +2^ 1 (0, 0, oo) and deduce 
from Proposition [25] and Lemma [271 that for all p > and all n > 1, x > 1, 

P > 2(1 + x)(2 + ZijSifn- 1 )) 

< P (JVW > (1 + x)2A 1 (n- 1 )) + P (^(0, 0, oo) > (1 + s^A^n" 1 )) < ■ 

Next, consider 2 < k < m. Recall that we denote by £fc(<S) = {i £ [k] : G 5} the set of 
labels in a subtree <S C T. We set 7?^ = and split the spine [[p, at homogeneous times 
= inf{i > : ^C k {T^At)) < j} for k — 1 > j > 1, some of which may coincide. Repeated 
application of Corollary 1201 Lemma [2T1 and arguing as for (|20p yields that 



D w < 2( D« - 1) < 2^) + 2^ (e*(i[ k) )A k \oo) +y. 2K u^ (e k d\ 



J=2 



(fc)\ (fc) (fc) 



where < fc _f (^(if) 

' Tj^'TI-i) * s as ™ Lemma [27] but here associated with the subordinator 
= £ Sfc >j'(7^ +•) — ^ Sfc ' j (7j fc ' > ) that has Levy measure A^ k ^ = A\ and with random variables 
y/ fcj) = inf{t > : ^ .)(<)}, i > 1, where E kJ = X e if I = min£ fc (7j fc) ( 7 f ))). 

Distribute most of Z fe onto ^ W (7_f ^j^) = ! + (! + A Q )C Al exp(-^f k ). Then 



P (l)< fc ) > 2(1 + x)(2 + Z fc )Ai(n -1 )) < P (n<® > 2(1 + x)A 1 ( n - 1 ) 



+ E^Ki(e Efc (7f ),7f ) ,7f_ ) 1 ) > {l + x)Z^\f\ 1 f} l )A 1 {n^) 
3=1 

and conclude again by Proposition 1251 and Lemma [271 with constant Cp PCC + kCp . 

Now consider > m + 1. We set 7^1 1 = and 7q = 00. We split [[p, at homogeneous 
times 7^ = inf{i > : S* ^ 7§ fc) (t)} and 7 j fc) = inf{i > 7$ : #C k {7^Jt)) < j} for 



m — 1 > j > 1. Note that, by Procedure El \(7m )) < 771 • Again 



D W < 2(j D« - 1) < 2tf« +J22K^ (sf\f\f\) + 2K££(p, 0, 7 «), 



where = £ fc (7: ), other notation as for k < m, and 7Q_^(0, 0, 7m ) is as in Lemma[271 here 



based on the subordinator with Levy measure A*, and = inf{t > : S* T®« At)}, 



the homogeneous time when St and S* are first in different subtrees. We get 
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F(D^ > 2(l + x)(2 + Z fc )max{Ai(n- 1 ),A*(n- 1 )) < P (V W > 2(1 + x)A 1 (n- 1 ) 



+ EFK-£(^ fc (7f),7f ,7^) > (l + x)Z«(7« 1 7j!? 1 )Ai(n- 1 )) 



+p (^(o,o, 7 «) > (l+zjzwco^wjrcn- 1 ; 

and conclude again by Proposition [25] and Lemma [271 with constant C' p = Cp pec + mCp + C*. 
Let H^- be the height of the ^-self-similar CRT (T e ,^ g ) obtained from (T, //) by ^-self-similar 
time-change. By [18, Proposition 14], the height H^- has exponential moments and so does Zj.: 

supZ fc < m+ (1 +A a )-max{C Al ,C A *} ( m + sup / (X (Sfc) (i)) e eft 
fc>i y fe>i Jo y 



< m + (1 + A a ) max{C Al , C A * } (m + 



5.4 Proof of Theorem \7\ 



The previous sections contain the new developments that we need to apply the techniques 
developed in [20J for the exchangeable case in the higher generality of Theorem [TJ We only 
briefly retrace this argument here so as to identify the places where a result in the previous 
sections here replaces a more specific result of [20 . 



Lemma 30 (Lemma 10 and Corollary 11 of [20]) Let H n = maxi<fc<„ D„ be the height 
of T n . Then there is a constant C Pja for all a > 0, p > 2/ a, such that for all x > 1 and n > 1 



11,1 > ax) < ° p ' a 



,max{Ai(n _1 ), A (n^ 1 )} / xP 
The proof is based on Lemma [5U] replacing |20[ Lemma 12]. 

Lemma 31 (Proposition 9 of [20]) Under the hypotheses of Theorem^ let for n > k 

A(n,k) := max d n ({i}, R(T n , [k])), 

l<i<n 

d n being the metric associated with T n . Then for each f] > 0, 



, A(n,k) . 
lim limsupP = — E > ri \ = 0. 

1 maxlA^n- 1 )^^- 1 )} 



k— >oo 



n^oo 



The proof is based on Proposition [26] or ([23]) replacing "clearly" , Corollary [22] replacing [20] 
reference [10] there, Lemma 3.14], and Lemma [30l replacing [201 Corollary 11]. 

Proof of Theorem [TJ This proof is now based on fj23l) replacing [201 reference [29]], Lemma 
I3T1 replacing [20, Proposition 9], Proposition [28] replacing [201 Proposition 7]. □ 
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6 Examples: skewed Poisson-Dirichlet models 

6.1 The alpha-gamma model as restricted exchangeable hierarchies 

The alpha-gamma model is a consistent family (T n ,n > 1) of random hierarchies of [n], 
n > 2, whose distributions (Q n , n > 2) are such that the conditional distributions of T n+ \ given 
T n are particularly simple. To describe these, recall (see e.g. [26]) that consistency t n = t n+ i(~i[n] 
for two hierarchies t n of [n] and t n +i of [n + 1] means that there is a unique B G t n such that 
either B splits into two vertices B and BU{n+ 1}, {n + 1} is added and some vertices renamed 

t n+1 = t^" edge = {AU{n + l}:iG t n , 5 C A} U {,4 : A G t„ : 5 % A} U {5, {n + 1}} 

and we then say that n + 1 is inserted into the edge below vertex B G t n , or 

tn+i = tf- vertex = {iU{n+l}:iGt n ,5Ci}u{i:iGt n :B^}U{{n + l}} 

and we then say that n + 1 is inserted into the internal vertex B G t n \ Or n i. 

For parameters < a < 1 and < 7 < a, we specify T\ = {{1}}, T2 = 0r 2 ] U lr 2 i and for 
n > 2 



P(T n+ i 


j. B— edge 
— L n 


T n 


= t n )= l=a 


for all Bet n 


with #B = 1 


P(T n+ i 


J.-B— edge 


T n 


- t 1 - — L 


for all -B G t n 


with #B >2 


P(T n+ i = 


j. _B— vertex 


T 

- L n 


_ t \ _ (fcf-l)a- 7 


for all B et n 


with #B >2 



where fc„ + 1 is the degree of vertex B G t n , or equivalently fc = fc„ the number of blocks of the 
partition of B into maximal subsets A\, . . . , Ak of B in t n . Let Q°' 7 be the distribution of T n . 




Figure 1: Alpha-gamma growth rule: displayed is one vertex B of T n with degree k + 1, hence 
vertex weight (/c — l)a — 7, with — r leaves L r+ i, . . . , G [raj and r bigger subtrees Si,. . . ,S r ; 
all edges also carry weights, weight 1 — a and 7 are displayed here for the leaf edge below {L^} 
and the inner edge below B only; the three associated possibilities for T n +i are displayed. 

Proposition 32 The alpha-gamma model for a G [0, 1] and 7 G [0, a] is a restricted exchange- 
able Markov branching model with dislocation measures of the form identified in Corollary^ with 
vi = (1 - a)PD* _ a _ 7 and uj = 7PD* _ a _ 7 , j > 2. 

Here, the Poisson-Dirichlet measure PD* e (ds) as cr-finite measure on was given by \27\ [2T] 

E[cj?; <7f 1 A<T [0 ,i] G <is], > -2a, a G (0, 1), 
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on the interior of the parameter range, where (o~t,t > 0) is a stable subordinator with Laplace 
transform E[e~ AfTt ] = e~' A " and Acr[o,i] is the decreasing rearrangements of the jumps Ao~t = 
at — a t _, i £ [0,1]. A separate, but compatible definition on the boundary of the parameter 
a G {0,1} can be given. More importantly, the binary case PD* _ 2a ,(ds) is the ranked beta 
measure on {(x,l -x,0,...),x G (1/2,1)} C S l with density 2ar a (l - x)~ a l(i/2,i)(a0; the 
associated Markov branching model is Aldous's [I] beta-splitting model. 



p n {ni, ...,n k )-{L- a) T(n+1 _ a) r(i- 7 /a) 1 li=i r(i-a) > 71 G ' n ~ r n 

p n (ni, . . . ,n fc j - 7r(n+i- a ) r(i- 7 /a) lli=l r(i-a) > ^ e ' n - rn 



Proof. We claim that the distribution of the partition Tl n of T n ~ Qn' 1 at [n] is given by 

p(n„, = tt) = 

and that (Qn' 7 ,n > 2) has the labelled Markov branching property 

k 

P(n n = TT, = Si, . . . , SI = Sfc) = ^(#7n, . . . , #7T fc ) J] Q^({sJ), TT G V 3 n , 

i=l 

where Q^f 1 is the push-forward of QSJ. under the natural bijection on the set of hierarchies 
induced by the increasing bijection from [#7Tj] to 7Tj. 

We show this by induction on n. Specifically, for n = 2, this is trivial, for n = 3 we have e.g. 

p(n 3 = {{i, 3}, {2}}) = p(n 3 = {{i}, {2, 3}}) = Izf^, p(n 3 = {{i, 2}, {3}}) 



2- a " 2- a 

If the claim holds for n, we can apply the growth rules and the induction hypothesis to see 

P(II n+1 = {[n},{n+ 1}}, = si, 5 2 " +1 = {{n + 1}}) = -^_Q n ({ Sl })Q {n+1} ({n + 1}), 

and for tt = (tti, . . . , irk) G Vn, j = 1, 2, and hierarchies Sj of 7Tj, i 7^ i', and s,/ of tt^ U {n + 1}, 
P(II n+1 = (tti, . . . , 7T fe , {n + 1}), = Sl , . . . , S£ +1 = s fe , SJ+l = {{n + 1}}) 

= (fc ~ 1)a ~ V n(#7n, . . . , #vr fc )Q {n+1} ({{n + 1}}) ]J Q„, ({*}), 
n — a 1 J x A - 

i=i 

P(n n+1 = ( 7 ri,...,7r^U{n + l} 5 ...,7r fe ),^ +1 = si,...,^+ 1 = s ifc ) 

= — -PL(#^1, ■ ■ #Tfc)<9 v U{n+l}({Si'}) TT ^({Sj}), 

n — a 

as conditionally given that the insertion of n + 1 is in subtree S 1 ™, it is just as an insertion of 
#7Tj/ + 1 into T#n / 1 pushed forward from [#7Tj/ + 1] to 7Tj/ U {n + 1}. The induction proceeds. □ 

6.2 The skewed Poisson-Dirichlet model extending the alpha-gamma model 

Proposition [32] suggests to introduce a three-parameter family of restricted exchangeable frag- 
mentation trees that we call the skewed Poisson-Dirichlet model, by setting 

v 1 = XPT>* a>e , v j = (l-\)PT>* a>e , j>2, 

for a G [0,1], 6 > -2a and A G [0, 1]. When A = (1 - a) /{I - 6 - 2a) and 6 = -a -7, this is the 
alpha-gamma model; when A = 1/2, this is the exchangeable Poisson-Dirichlet model studied 
in [261 IH]. We will use parameterisations by (a, 6, A) and (a, 7, A), where 7 = —a — 9. We can 
apply Theorem [7] to obtain a strong convergence result: 
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Corollary 33 Let (T n ,n > 1) be a consistent family of skewed Poisson-Dirichlet trees for pa- 
rameters 0<a<l,0<7 = —a — 9 < a and < A < 1. Then 

— — * T in probability, in the Gromov-Hausdorff sense, 
ni 

where T is a 7 -self-similar CRT with dislocation measure 

u(dB) = + (1 - 2A) sij PD^(ds). 

Regarding the alpha model, a 6 (0, 1), 9 = —2a, A = 1 — a, this confirms in part a conjecture 
formulated in [30]. Another interesting feature of the skewed Poisson-Dirichlet model relates 
to sampling consistency. Here we say that a family of unlabelled random trees (T°,n > 1) is 
sampling consistent if the tree T° with a uniformly chosen leaf removed is distributed as T° -,. 
For consistent trees with exchangeable labels such as the exchangeable Poisson-Dirichlet model 
this is trivially so, but also for the alpha-gamma model that includes non-exchangeable trees. 

Proposition 34 The skewed Poisson-Dirichlet model is sampling consistent only for parameters 
that reduce it to the exchangeable Poisson-Dirichlet model or to the alpha-gamma model. 

Proof. By Corollary [SJ the skewed Poisson-Dirichlet model has dislocation measure 

k = I (XK s (-nv°^) + (i-X)K s (-nv 1 ^))PD* ad (ds). 

From this, we can calculate splitting rules. Specifically, we can calculate the distribution of 
the ranked sequence S n = (#IT n> i, . . . ,#11^)^ of block sizes of U n = (II nj i, . . . ,H n ,K n ) by 
summing @ over partitions of equal ranked sequence of block sizes and obtain 

p ( s 2 = (i, i)) = i, ns 3 = (1,1,1)) = ^^), p(s 3 = (2.D)- (1+A)(1 - a) 



T{Si = (1,1,1,1)) = P(S4 = (2 , d, - a + «>< 2 « + w - ») 



KS 4 = (2,2))= ™-°> 2 H^RD) ^'-*-'' , 

where D3 and D4 are normalisation constants of the form a^X + 63 and CI4A + 64. Using the 
criterion of [2D], sampling consistency requires, in particular, that 

F(S 3 = (1, 1, 1)) = POS4 = (1, 1, 1, 1)) + \nS± = (2, 1, 1)) + \nS4 = (3, 1))P(5 3 = (1, 1, 1)), 

which upon multiplication by D3D4 is a quadratic equation in A. Common coefficients of all 
terms include (1 — a) and (9 + 2a). For a < 1 and > —2a, the quadratic equation has 
the two solutions A = 1/2 and A = (1 — at) /(I — 9 — 2a) corresponding, respectively, to the 
Poisson-Dirichlet and alpha-gamma models, so no other models can be sampling consistent. 

The exchangeable Poisson-Dirichlet is trivially sampling consistent. The alpha-gamma model 
was shown in [TTJ to be sampling consistent. In the excluded case a = 1 models for all 9 collapse 
to the same deterministic model where all leaves are connected directly to a single branch point 
[26j . For the binary case 9 = —2a, which we also had to exclude, we need to consider S5 giving 
similar quadratic equations, but also lead to the required conclusion that only the alpha model 
A = 1 — a and the beta-splitting model A = 1/2 are sampling consistent. We leave the details 
to the reader. □ 
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A Proof of Lemma [271 



The first part of Lemma [27] is a straightforward consequence of [16] , see also \20\ Lemma 8] . 
The second part generalises [201 Lemma 12]. For the convenience of the reader, we reproduce 
the proof here, rewritten for our higher generality. 

Let N y (ti,t2) denote the number of jumps of £ of size at least y in the time interval [ii^L 
Ny' T (ti,t2) denote the number of jumps of exp(— e)(l — exp(— £)) of size at least y in the same 
time interval. 

Step 1. Large deviations for Ny' T (0, t"). 

Lemma 35 For all x > and < y < 1, 

[r"] \ 

N*< T {0, t") > (1 + x)C A ]T exp(- e (e + fc))A(j/) < exp(-a x A(y)), 



i=0 



where a x := (1 + x) ln(l + x) — x > 0. 



Proof. Let denote the a-field generated by (e, r, t" A t) and £ until time t, and fto T the 
one generated by (e,r, r") and £, and observe that 

[r"] _ [r»] 

^• T (0,r") < £^' T (M + 1) < X> ycxp(e+Si) (M + !)■ 



i=0 



8=0 



Conditional on J^' 1 ", A^ yexp ( £+ ^.)(i, i + 1) is a Poisson random variable with mean A(y exp(e + 
But for any Poisson random variables P with mean A, one has 

E [exp( 7 P - (1 + x)7A)] = exp ((exp(i) - 1 - (1 + a?)7)A) , V7 G R. 

In particular, when 7 = ln(l +x), exp(7) — 1 — (1 + x)"f = —a x < and the expectation is smaller 
than 1. Hence, for all n G N, using the tail bounds of A for the first inequality, we obtain, for 
all y<l, 

[r"]An [r"]An 

^expCe+^^^ + l) > (1 + ^)Ca exp(- e (e + 
i=0 8=0 

([r"]An [r"]An 
N yex P (e+c i ){^ i + 1 )^( 1 + x ) Yl A(i/exp(e + &)) 
i=0 i=0 



< E 



< E 



[r"]An 

exp j 7 j E (A^ exp ( £+?i )(M + 1) - (l + x)A(yexp(e + £i))) 

8=0 
r"]A(n-l) 

<-P I" I ••• I I E[exp( 7 l {[r //]> n} (A^ yexp(£+?n) (n,n + l) 

i=0 



-(l + x)A(yexp(e + £„)))) 



< 



< exp(-a cc A(y)), 



the last line being obtained by induction: at each step but the last we use the upper bound 
1 for the conditional expectation and for the last step, we use the upper bound exp(— a x A(y)) 
for the expectation E [exp ( 7 (iVy(0, 1) — (1 + x)A(y)))] . It remains to let n — » 00 in the first 
probability involved in the above sequence of inequalities and to use Fatou's lemma. □ 
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Step 2. Large deviations for ~E[K n (e, r, t'~)\T t ), ]. 

Lemma 36 Let B a := X^fcLi ex P( — ^~ 1 a\k a / 2 ) with a\ = 21n2 — 1. Then for all x > 1 and all 
integers n large enough, 

¥(E[K n (e,T,T') \j*J] > (l + x)(Y(e,r,T , )-l)A(n~ 1 )) < (1 + B a ) exp(-4~ 1 aixA(n- 1 )). 

Proof. According to formula (4) of [16J , 

E[K n (e,T,r')\^ T ;r]=n J (1 - y^A^O, r")dy < N^JO, r") + n J N^(0,r")dy. 

Hence, setting 5 := C\ Ya=o ex P(-Q( E + &))> 
P(E [K n (e,T,T')\T £ T ;r] > (l + x)(l + A a )SA(n~ 1 )) 

< P (iV^(0, r") > (1 + x)SA(n- 1 )) + P (n N^ T (0, r")dy > (1 + x)A a SK{n' l )^j . 

The first probability in the right-hand side is smaller than exp(— a x A(n _1 )) by Lemma [351 To 

bound the second probability, we use n fy^ + ^ n Ny' T (0, r")dy < N*'^ k+1 ^ (0, 

gives 

P jf 7 A>' r (0, r")dy > 4*(1 + aO^n -1 )^ 

oo 
k=l 

Since A is regularly varying at with index —a, we have, provided n is large enough, that 
A(n -1 )(Jfe + l) a/2 < 2A(((n(fc + l)) -1 )) and A(((fc + ljn)- 1 ) < 2A(n- 1 )(A; + for all fc > 1 
(to see this, use, e.g. Potter's theorem, Theorem 1.5.6, |10| . Combined with Lemma [351 this 
implies that the above sum of probabilities is smaller than 

oo oo 

^exp(-a x A((n(A: + l))- 1 )) < £ exp(-2- 1 a :l .A(n~ 1 )(A : + I)"/ 2 ). 

k=l k=l 

Last, the exponential in the latter sum can be split in two, using (k + l) a / 2 > 2- 1 (W 2 + l), to 
get the upper bound 

oo 

exp(-4- 1 a 1 A(n- 1 )) exp^a^A^ 1 )^/ 2 ), 

k=l 

which is smaller than exp(— a\xK{n~ 1 ))B a for all x > 1 {a x > a\x for x > 1) and n large enough. 

□ 

Step 3. Proof of inequality (I19p . To start with, fix x > 1, n G N, and note that 

P t,t') > (l + x)Y(e,T,T')A(n- 1 )) 

< ¥(E[K n (e,T,T , )\T £ T ' i ;]>(l + x)(Y(e,r,T , )-l)A(n- 1 )) (24) 
+P (K n (s, r, r') — E [K n (e, r, r') |.F r £ ;, T ] > (1 +x)A(n" 1 )) . 
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Lemma [36] gives an upper bound for the first probability provided n is large enough. To get an 
upper bound for the second probability, we use a result on urn models (Devroye [12], Section 6) 
which ensures that 



[K n {e, t, t') - E [K n (e, r, r') \T £ >,J] > y \T £ J ) < exp 



y 2 



2E [K n (e,T,T')\F E J]+2y/3 



for all y > 0, n £ N. This implies that for all m > 1, there exists some deterministic constant 
B m depending only on m such that 

P (K n (e, r, t') - E [tf n (e, r, r') |.F r e ;, T ] > (1 + sjXfrT 1 ) |j^> T ) 
'E [«n(e, r, r') |^;, r ] + (1 + x)JL{n- 1 ) \ m 



< B, n 



((1 + x)A(n~ 1 )) 2 



< 9 m-i R (E [K n (e,r,r')\^ T ;r]) m + ((l + x)A(n-^) m 

((l + x)A(n-!)) 2m 

< 2m -i B E [(K n (e,r,r')r + ((1 + ^(n- 1 ))" 

((l + x)A(n-i)) 2m 

the last line being obtained by Jensen's inequality. We then take expectations on both sides of the 
resulting inequality. Theorem 6.3 of [16] ensures that E[(K n (e, r, T')) m \e, r] < E[(K n (0, 0, oo) m ] ~ 
(A(n _1 )) m (up to a constant). Therefore, we have 

P(K n (e,T,T r )-K[K n (e,T,T')\T £ ;r} > (1 + x^n" 1 )) < B mA ((1 + x^n" 1 ))"" 1 , (25) 

where -B m ,A depends only on m and A. 

Next, recall the upper bound given by Lemma [36] for the first probability involved in the 
right-hand side of ([241) . Together with the upper bound (f25j) . it leads to the existence of B' mA 
such that 

P r, tO > (l + x)Y(e,r,r')-A(n- 1 ))<B' mA x- p (A^ 1 ))-" 1 

for all x > 1 and n large enough, say n > uq. Since A(n~ 1 ) ~ n a £(n) when n — > oo, this upper 
bound is in turn bounded from above by x~ m n 1 ~ am up to some constant, which is the required 
result ([15]) . 

Finally, inequality (]19p is also true when n < no (for all x > 1), since K n (e, r, r') < n < no 
and y(e,T,r') > 1, and therefore the probability P(if n (e,T,r') > (1 + x)Y(e, r, T / )A(n~ 1 )) is 

null whenever 1 + x > no (A(n -1 )) 1 . 

This completes the proof of Lemma [27] □ 
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